Intro
I’ve seen a decent number of people ask how many Mystery Items one usually needs to open before they can expect a full set. Here, I’m defining a full set to be at least one of every item from the Mystery Item Pool. For example, if someone were opening Mystery ST Skins, a full set would be owning at least one of all 15 (at the time of writing) mini ST skins.
This is a great question to explore some interesting math ideas, and it has applications in a lot of different video games (at least, ones with random lootboxes). The key ideas and assumptions to keep in mind are
- opening Mystery Items is presumably independent,
- all items in the Mystery Item Pool are equally likely to be obtained in a single opening, and
- one can obtain multiples of the same item before obtaining a new unique item.
If the reader isn’t interested in the cool math, they are free to skip to the Results section where I’ve made some tables for various Mystery Items and how to calculate the number on their own, but I highly encourage everyone to read the cool math because, well, it’s cool.
Some Probability Background
What do I mean by independence? Essentially, opening one Mystery Item and getting some outcome shouldn’t change anything when opening another Mystery Item. This is important because it allows us to define some variables with neat properties. Let T be a random variable that represents “the number of Mystery Items one needs to open before obtaining a full set.” Random variables have a rigorous definition, but for our purpose just notice that T is
- Variable: because one lucky individual could potentially obtain all 15 mini ST skins in just 15 openings, but it might also potentially take 100 or even 1000 openings before obtaining all 15, and
- Random: because until one performs the experiment themselves, no one knows the value that T will take, and one can’t “solve for it” like one would expect for a usual variable.
We are interested in the quantity E(T), which refers to the “expected value of T” or the “expected number of Mystery Items one needs to open before obtaining a full set.” Expected value, or expectation, again has a rigorous definition but for our purposes we will think of it as
- Expectation: If we repeat the experiment forever and keep averaging the results we get from our random variable, this average will eventually converge to some number. We call this number the expected value of the random variable. This neat result is actually called the Law of Large Numbers.
So, we know that we want E(T), but that seems hard to calculate directly. After all, we don’t know anything about the properties of T. However, we might be able to break the variable T down into more manageable variables. After all, if we are interested in the number of openings to get a full set, don’t we initially have to look at the openings to get the first unique item? After that, we also have to look at the number of openings to get the second unique item, and then the third, and so on. It’s natural to specify “unique” because we could obtain duplicate items. This is all the setup we need to start doing the meaningful work.
Collecting Coupons
Those who have taken a probability course likely already know this problem as the Coupon Collector’s Problem, and we will proceed exactly as the mythical coupon collector once did. Let T1 be a random variable that represents the “number of openings to get the first unique item,” T2 be a random variable that represents the “number of openings to get the second unique item,” and so on until Tn, where n is the number of unique items in the Mystery Item Pool (15 in the ST skins example). We have
There is a nice property of expected values that lets one break up the expected value of a sum into the sum of expected values. This property is called Linearity of Expectation, and using it here gives us
Now we must ask the question: what do we know about the Ti’s (using i = 1, 2, …, n as an index here)? What makes them easier to work with than T? Let’s reword “openings” to “independent trials” and reword “obtain the i th unique item” as “success.” All of the Ti’s can basically be described as “the number of independent trials until success,” and this turns out to be a well-known probability distribution called the geometric distribution. Using the geometric distribution, one obtains that the expected number of trials until success is just the reciprocal of the probability of success per trial.
- Geometric Expectation: One may already have an intuitive understanding of geometric expectation from playing with dice. We know that the probability of getting any face on a fair six-sided dice is 1/6. Let’s say someone wants to roll a 1 and is willing to keep rolling until they get their first 1. How many rolls should they expect to perform? Many would intuitively say six rolls. What if they instead want a 2? Well the probability of getting a 2 on any roll is still 1/6, so again they should expect to perform six rolls. Here, obtaining the number they want is the “success condition,” and for a fair six-sided dice the probability of success is always 1/6, so one should always expect six rolls before first obtaining any number (six being the reciprocal of 1/6).
So what is the probability of getting a unique item in a single opening for each of the Ti’s? Well let’s look at T1. We are guaranteed to get a unique item on the first opening because no items have been seen yet. Thus, the probability of getting a unique item on the first opening is 1, and E(T1) = 1/1 = 1. What about on the second opening? Well, we’ve already seen one of the items (the one from the first opening), so we don’t want to hit that one. There are n items total and on our second opening we only want to see n-1 of them (taking out the one from the first opening), so the probability of getting a unique item on the second opening is (n-1)/n and we have E(T2) = n/(n-1). Similarly, we can see that E(T3) = n/(n-2) and so on until E(Tn) = n/1 = n. Thus, we have
We can rewrite this with some factoring as
The sum 1/n + 1/(n-1) + 1/(n-2) + … + 1/2 + 1 is known as the n th Harmonic number, which we will denote Hn. We can conclude that
In other words, we need to open n * Hn Mystery Items before we can expect the full set of n items from the Mystery Item Pool. For the Mystery ST Skin Example, this comes out to be 15 * H15, which is approximately 50 openings. Computational engines such as Wolfram Alpha will usually recognize the n th Harmonic number if one uses the subscript notation “H_n.” If one would rather use a calculator over a computational engine, there is the nifty approximation
Here, ln(n) is the natural logarithm of n and 0.5772156649 is the first few decimals of the Euler–Mascheroni constant. The math behind this approximation is a bit beyond the scope of this post, but I encourage any interested reader to check it out, as the result is very deep and has applications in fields such as number theory, analysis, and cryptography/cybersecurity. Note that this approximation will be worse for small values of n, which is the case for some Mystery Items that have a small number of options in their Item Pool (such as Mystery Keys). Using this approximation gives
Results and Tables
DISCLAIMER 1: The usual coupon collecting result only applies if all items are equally likely to be obtained (uniformly distributed) from an opening. One will not be able to use this result on most purchases from the Mystery Shop, as they often have a higher probability of getting things such as Shards and a lower probability of getting ST items. However, there are still methods to find the expectation if the underlying distribution of items is non-uniform but known.
DISCLAIMER 2: Keep in mind that actually performing the expected number of openings does not guarantee you a full set. These Mystery Items are random, and one could potentially spend their entire RotMG careers opening loot chests without obtaining their desired full set.
Below is a table of some common Mystery Items, the number of options in their item pools (n), and their expected number of openings to get a full set (rounded to the nearest integer):
Mystery Item | n | E(T) |
---|---|---|
Rare Mystery Character Skin | 27 | 105 |
Epic Mystery Character Skin | 28 | 110 |
Legendary Mystery Character Skin | 28 | 110 |
Rare Mystery Pet Skin | 27 | 105 |
Epic Mystery Pet Skin | 48 | 214 |
Legendary Mystery Pet Skin | 51 | 230 |
Rare Mystery Key | 8 | 22 |
Epic Mystery Key | 13 | 41 |
Legendary Mystery Key | 9 | 25 |
Mystery ST Skin | 15 | 50 |
Mystery ST Chest | 60 | 281 |
Mystery Stat Potion | 8 | 22 |
Shard of the Doorwarden x 35 | 3 | 6 |
Shard of the Intern x 15 | 16 | 54 |
For one’s own calculation of expectations, feel free to use this Wolfram Alpha link and simply replace the 100 with the number of possible items from the Mystery Item Pool.
Other than the expected number of openings to get the full set, one may also be interested in how many openings are needed before one is 50%/90%/etc. to get a full set. These are known as the percentiles of T. Usually, percentiles are found from knowing the exact distribution of a random variable, but to my knowledge this is not yet known for the coupon collector’s problem. Luckily, we do know a neat limit theorem for the distribution of T thanks to Laplace, Erdos, and Renyi. As usual, this approximation is better for larger n, so for things such as Mystery Keys the results may not be very accurate. However, we can still calculate a table of percentiles (rounded to the nearest integer):
Mystery Item | n | 50% | 60% | 70% | 80% | 90% | 95% | 99% |
---|---|---|---|---|---|---|---|---|
Rare Mystery Character Skin | 27 | 99 | 107 | 117 | 129 | 150 | 169 | 213 |
Epic Mystery Character Skin | 28 | 104 | 112 | 122 | 135 | 156 | 176 | 222 |
Legendary Mystery Character Skin | 28 | 104 | 112 | 122 | 135 | 156 | 176 | 222 |
Rare Mystery Pet Skin | 27 | 99 | 107 | 117 | 129 | 150 | 169 | 213 |
Epic Mystery Pet Skin | 48 | 203 | 218 | 235 | 258 | 294 | 328 | 407 |
Legendary Mystery Pet Skin | 51 | 219 | 235 | 253 | 277 | 315 | 352 | 435 |
Rare Mystery Key | 8 | 20 | 22 | 25 | 29 | 35 | 40 | 53 |
Epic Mystery Key | 13 | 38 | 42 | 47 | 53 | 63 | 72 | 93 |
Legendary Mystery Key | 9 | 23 | 26 | 29 | 33 | 40 | 47 | 61 |
Mystery ST Skin | 15 | 46 | 51 | 56 | 63 | 74 | 85 | 110 |
Mystery ST Chest | 60 | 268 | 286 | 308 | 336 | 381 | 424 | 522 |
Mystery Stat Potion | 8 | 20 | 22 | 25 | 29 | 35 | 40 | 53 |
Shard of the Doorwarden x 35 | 3 | 4 | 5 | 6 | 8 | 10 | 12 | 17 |
Shard of the Intern x 15 | 16 | 50 | 55 | 61 | 68 | 80 | 92 | 118 |
Of note: if one performs E(T) number of openings, they are around 60-70% to obtain a full set. This can be seen if one plugs the mean n * Hn into the limit theorem.
For one’s own calculation of percentiles, feel free to use this Wolfram Alpha link and simply replace the 100 with the number of possible items from the Mystery Item Pool and the 0.5 with the percentile of interest.