H1 vs H2 analyses

I thought we could use a dedicated thread for all the analyses done on H1 and H2.

I’d like to start by showing, that there is no BRS difference caused by the H1 vs H2 code. It may simply be coincidence that H2 has a few more high BRS Gotchis than H1.

But how can we be sure?
jarrod already showed that the curves look extremely similar:

But could there be a difference solely for the high BRS Gotchis? Could these two different lines of code between H1 and H2 lead to changes in high BRS Gotchis?

H1 code: value /= 2;
vs
H2 code: value = value - 100;

To investigate that, I simulated 10.000 populations of 10.000 Gotchis for both the H1 and H2 code and took only the 10 highest BRS of each population for analysis. The result:
H1 Top10 BRS mean: 562.17
H1 Top10 BRS standard deviation: 2.78
H2 Top10 BRS mean: 562.02
H2 Top10 BRS standard deviation: 2.77

So in conclusion, there is no evidence that “value /= 2;” produces fewer high-BRS Gotchis than “value = value - 100;”

EDIT: I only simulated 10.000 random Gotchis, NOT 10.000 portals with 10 Gotchis each. That is why top10 BRS means are low compared to the real data. I also ignored collateral modifiers, as they only give ±1 and hence could not be responsible for a BRS difference larger than 1.
I was convinced that if there is an inherent difference between the two different formulas for calculating traits, this would be enough to show the effect. Since there still seems to be some skepticism, I provide results of a more accurate simulation down below, including statistical tests.

2 Likes

What were the conditions of your simulation? If you could share more details as to what was used to simulate, we can know better if there are perhaps differences between the mechanics of the simulation vs of reality.

Also, as with other posts that have shared the results of “thousands of simulations”, you still always see a result of more ultra-high BRS (565 and above) in the H2 results. Every time this difference has been dismissed as minor, but if consistently everybody’s simulations are giving a higher result of top BRS for H2, that could be taken as a predisposition the simulations are actually helping us see as confirmed (and aggravated) in the real world results.

Additionally, while we’re at it, let’s run simulations of how grossly under-represented H1 would be in the top 10,20,50,100. After all H2 portals are opened. After H3, H4.
Simulations and charts are great- but real world results and effects are more important.

We simply have different populations created by different code (and 6 months apart,) where H1 isn’t ranking competitively in the main competition, after only half the supply of H2 portals are opened.

We may never have enough “evidence” to convince some of you of a BRS adjustment, and personal biases will always play a big part. I am not sure any simulations can account for everything. I.e. -Who can prove that polygon network congestion and average gas price affect the interaction with Chainlink’s VRF or not? Has anyone checked if there have been any changes to the VRF since Haunt1? Different servers? Beyond these examples, we have a statistician from the community telling us that the statistical comparison you are using to make the bold and dismissive conclusions in your post, are not scientifically valid:

2 Likes

Where is the data for these simulations? Can it be added to this thread?

I.e. -Who can prove that polygon network congestion and average gas price affect the interaction with Chainlink’s VRF or not? Has anyone checked if there have been any changes to the VRF since Haunt1? Different servers?

This is just FUD. If you genuinely think Chainlink VRF is bugged, fraudulent or rigged why would you stay invested in Aavegotchi? Rarity and raffles would be completely broken.

Who said that lol? Those were cheeky examples that took me 5 seconds to produce as to the difficulties of producing a controlled environment for ultimate comparisons and conclusions between haunts, followed by the real valid argument by a statistician, as I already expect my points and sources to be discredited by you and the OP.

IIRC, there have been simulations by Doxy, Mori and the one in this thread.

I cannot attest for the validity or integrity for any of the simulations however, so I will not go out of my way to find and post them. We clearly have an issue of bias depending on who present the data, such as how you guys conveniently focus on the middle of the curve.

That should be the focus, you need a larger data set rather than just cherry-picking the slice you have a financial interest in.

If there is a statistical bias towards higher rarity gotchis in H2 then the standard deviation and variance should be higher for H2 portal options then it is for H1 portal options. This is not the case as of now.

1 Like

To be clear, I agree with this and that goes both ways, fren.

The problem is the entire user base and economy is focused on the last inch extreme of the right curve, which you and others continue to dismiss in favor of looking at averages. It’s not about me, it’s about an economy that makes sense- over rewarding the holdouts that didn’t invest until H2 and now want to secure supremacy. In a 6 month project, you want those that invested substantially only in the last couple of weeks to secure a permanent advantage. Because you are one of them dear fren ser Jarrod, this is plain to see.

It’s not many of you, but you are a few vocal bunch that had a mission to “nerf da whales” and this unintended result from H2 has fit well and exactly into your plate. It’s all over your post history.
Of course you want nothing fixed.

I have said this before, maybe what is broken is the RF paradigm. If we can’t analyze the extreme right of the curve- and its a game of averages, then lets distribute rewards by same parameters as voting power calculation, and move on from this broken and divisive paradigm of forcing two very different haunts to compete in the same BRS pool.

2 Likes

Like I said, in the OP I was mainly focused on the difference between the H1 and H2 code I quoted. I did not originally include 10 Gotchis per Portal, nor did I include collateral trait modifiers. The analysis was done in Matlab, so I used randi() instead of VRF. IMO, these things are not important to see the impact of this one line of code that is different between H1 and H2 (as long as everything else in the sim is equal between the test populations). To many people (including myself) it looked like that one line would account for the difference we’ve been seeing. However, that does not appear to be the case.

That said creating a sim that is closer to the actual data allows for a better understanding of the situation. Hence, I now included collateral trait modifiers as well as choosing the best Gotchi out of 10 random Gotchis for the 10.000 Gotchi population. As was to be expected, that raised the mean BRS of the top 10 Gotchis, which is now very close to actual data.

To recap, bootstrapping was used to simulate 10.000 populations of 10.000 Gotchis, where each Gotchi was chosen to be the highest BRS Gotchi out of 10 randomly generated Gotchis (portal mechanic). From each population the 10 highest BRS Gotchis (top 0.1%) were taken to generate a probability distribution. This was done with the H1 formula and collaterals as well as the H2 formula and collaterals:

Fig. 1:

Using a Mann–Whitney U test, there seems to be a statistically significant difference between these two distributions:
p = 1.095e-06
z = -4.87
n1 = 100.000
n2 = 100.000

H1 mode: 573
H2 mode: 573
Difference = 0

H1 median: 574
H2 median: 575
Difference = +1

H1 mean: 575.3322
H2 mean: 575.416
Difference = +0.084

Given the large n, MWU has a very high statistical power. This means it is able to detect very subtle differences. Hence, it is important to look at the effect size, which at +0.084 BRS is very small.

Here I sampled 1.000.000 top 10 Gotchis from the two probability distributions in Fig. 1 and compared the distribution of their means:

Fig. 2:

So the difference between probability distributions of the top 10 Gotchis for the H1 and H2 simulation is very small. This could be caused by the ratio of +1 and -1 trait modifiers from the collaterals. In H1, 44.44% of collaterals are negative, and since H1 has less traits below 50, this could be a slight disadvantage and explain this very small difference.

That said, the actual difference in mean BRS between H1 and H2 as of right now is +4.4 BRS for H2. How likely is that? In order to answer this question, I sampled 1.000.000 top 10 Gotchis from the two probability distributions in Fig. 1 and compared the differences of their mean:

Fig. 3:

The red line shows the mean (0.084) and the green line the actual observed difference of 4.4. The probability of observing this or a more extreme difference is 2.1%.

On the one hand, 2.1% is very much possible, on the other hand it is unlikely enough to warrant further analysis.
I’d be glad if someone would try to reproduce my findings. If they are correct, then either a rather unlikely event happened, or the higher top 10 H2 values are caused by something other than the collaterals and one line code difference. In case of the latter, I’d be curious to know what.

My conclusion might have been bold, or even premature, I agree. So I changed it, to better reflect my findings. However, the rest of your statement is wrong in more ways than one. Also, I cannot imagine that Janbao appreciates to be grossly misinterpreted like that. What you wrote and what he wrote is completely different.
FYI: Bootstrapping and nonparametric tests are not just scientifically valid but also less prone to errors compared to parametric statistics, because way less assumptions need to be met by the data.

Please stop digging yourself deeper. :man_facepalming: What you quoted is not even an argument.

Once you have something to contribute, you are very welcome to present it here.

6 Likes

So I have contributed?
Wait, I thought you were implying I have nothing to contribute.

Want me to find your litany of biased and socially resentful posts to prove how biased you are and therefore as ultimately unreliable in anything you present as I am?

In a 6 month project, you want those that invested substantially only in the last couple of weeks to secure a permanent advantage. Because you are one of them dear fren ser Jarrod, this is plain to see.

I have no mythical or godlike wearables (except for those I have fractional ownership in via Unicly), I am absolutely no threat to the top rankings in rarity. This is not my motivation.

It’s not many of you, but you are a few vocal bunch that had a mission to “nerf da whales” and this unintended result from H2 has fit well and exactly into your plate. It’s all over your post history.
Of course you want nothing fixed.

If we can get some conclusive proof that there is a flaw in the randomisation I will change my mind and be in support of a fix to remediate this gap for H1.

1 Like

Thank you. On my end I can concede I don’t have the statistical skills to engage further with OP on his results, should they be actually reliable and unbiased (another thing I can’t pretend to be versed on- verifying stat studies). My expertise is in economics and finance however, and I continue to warn the community that having most people who have spent in high BRS for the majority of the project being suddenly rekt after H2 and having to accept that “because statistics” is just bad business. It simply doesn’t build the proper confidence in the market.

The original post claimed the matter non-existent and settled, after my comments there is now acknowledgement there could be something further, so I will attempt to end my contributions there and not engage in bitter exchanges.

Maybe we need devs to step in now with solutions, or the DAO to commission an independent study. I struggle to think of ways forward beyond that.

1 Like

or the DAO to commission an independent study.

I would be in support of a DAO grant for this. Someone independent with the appropriate economic/mathematics background and qualifications can do this. Then there can be no accusations of bias.

4 Likes

I really like that you threw economics in there.
I fear a statistician could tell us what Janbao has already shared: the ultimate answer is there is no answer.

An economist or game economist could instead tell us how to fix the RF paradigm in a way that bonds the community vs the obvious division we see. I simply have become less and less of a fan of an RF paradigm that ultimately greatly rewards 1 person/wallet by several orders of magnitude, over the rest of the community, leading to the fragmentation of the community we can already witness.

H1 vs H2, BRS vs Kin, etc etc. I want to believe there is a creative solution that just clicks, etc. makes sense in the way many successful game economies make organic sense- vs. being so open to endless debate and the creation of winners vs. losers.

In a nutshell, we may need not only a statistician to tell us what is or isn’t broken, but a game theorist that tells us how to take this (already great) economic paradigm up to the next notch, with inclusiveness and communal growth (co-win) more in mind, than settling for (and patching up?) what already is.

2 Likes

Thank you for the very detailed analysis. Can you further explain what you mean by the 2.1% probability? That means there is a 2.1% chance that the mean BRS of H2 would be higher than that of H1? That does seem like quite a small percentage.

Is the script you wrote to generate these findings open source? If so we should get more eyes on it to verify the findings.

3 Likes

EDIT: My mistake! The graph reads “Portal Options Base Rarity Score Distribution”, so that clearly includes the historical data I was looking for below. I thought it was only summoned gotchis.

Please ignore the rest:

strike through

Is there any historical data of what gotchis were available in portals that weren’t picked? It seems like this difference in variance at the high BRS end could potentially be explained by people just choosing poorly not knowing what they were doing earlier in the project.

Seems unlikely people would make that mistake inadvertently. Even if they look at the options in the open portal only and know nothing else about project meta, there’s color-coding for the rarity of the gotchi, that makes it clear the super high BRS ones are different than all the others.

I don’t think that data exists, but never know until you ask.

It means there is a 2.1% chance that the mean BRS of the Top10 Gotchis from H1, and the mean BRS of the Top10 Gotchis from H2, are 4.4 BRS (or further) apart.

After I wrote the post you quoted, I did two more tests. I tested the hypothesis that the actual 10 highest BRS Gotchis from H1 are from a distribution similar to the one I simulated. I also tested this for the actual 10 highest BRS Gotchis from H2 and the H2 distribution I simulated. For H2 there was no evidence upon which to reject the hypothesis. However, for H1, I was able to reject the hypothesis at p < 0.001.
This means that (completely independent of H2) the 10 highest BRS Gotchis from H1 have lower BRS than one would expect from the H1 code and its statistics.
Edit 1: However, this has to be taken with a grain of salt, since not all 10k portals have been opened yet. Testing the highest 10 Gotchis from 7k opened portals against a distribution of the highest 10 Gotchis from (simulated) 10k opened portals is biased towards rejection (since the high BRS Gotchis we’re statistically “missing” might “be hiding in” these unopened 3k portals.
Edit 2: I did another simulation (with only 7k portals this time) and found the same result. So the top 10 H1 Gotchis really do have lower BRS than one would expect from the statistics (if my code is comparable to what is happening on chain, which at least for my H2 stimulation seems to be the case; but reproducing these findings couldn’t hurt).

I put the code on github: Code

However, I wrote it in MATLAB, which is not free. It’s what I’m used to for statistics and plotting. There is a free software alternative called Octave, which is usually very much compatible with MATLAB code. In case someone wants to try out the code, you can get Octave here: Download
But it is probably best if someone would just recreate something similar.

6 Likes

Thank you for this work and results.

Wondering if anyone in the community can take the randomization numbers from a certain population of H2 (the top 10, all, etc) and see what results the portal options would have contained under H1 code?

I am starting to suspect a top BRS gotchi from H2 - if simulated under H1- would have the issue of of a 102 result changed to 52 (IIRC the figures correctly) .

1 Like

Interesting results fren.
I ran some simulations with ‘‘unbugged haunts’’ and come to the same conclusion: what we observe is unusual but not impossible.
Here’s a link for a simple simulator in python: HauntsSimulator/Simulator.ipynb at main · letsgobankless/HauntsSimulator · GitHub

Right now, in the top 100 there are (i think) 60 H2 & 40 H1. If we use that simulator and run it 10 000 times, we find this can happen 782 times. So about 7,8% of chances.

I think this the main problem H1 is facing compared to H2, although it’s pretty negligible (about 40% of low-traits H1 gotchis compored to the 44.44% of negative collaterals)

4 Likes

Please correct me if I’m wrong, but I would counter “negligible” is a very relative term, in the sense that after the top 100, it may approach that negligible status- but looking the other way, from the top 100 down to the top 50, top 25, top 10, it becomes more and more impactful, is it not?

May be negligible in reaalm and aarcade mechanics, but there’s obvious ramifications for RF - and solutions too : rewards could be altered to be less top-heavy for instance, this way not calling for tweaks to the actual gotchis.

2 Likes

No. We were talking specifically about the trait modifiers. And top 10 is exactly what I looked at above. The average effect on the top 10 seems to be about 0.085 BRS less for H1 compared to H2. That is negligible.
Running the sim without collateral trait modifiers shows an average of 0.02 BRS more for H1 top 10 compared to H2. And that is likely just random variation, as it would never be exactly 0.0.

Given that the 0.085 BRS is not just tiny but also close to random fluctuation, I don’t see how this would have a noticeable impact on RF.

2 Likes