How many Mystery Items do you need to open before collecting the whole set?


#1

Reddit crosspost

Intro

I’ve seen a decent number of people ask how many Mystery Items one usually needs to open before they can expect a full set. Here, I’m defining a full set to be at least one of every item from the Mystery Item Pool. For example, if someone were opening Mystery ST Skins, a full set would be owning at least one of all 15 (at the time of writing) mini ST skins.

This is a great question to explore some interesting math ideas, and it has applications in a lot of different video games (at least, ones with random lootboxes). The key ideas and assumptions to keep in mind are

  1. opening Mystery Items is presumably independent,
  2. all items in the Mystery Item Pool are equally likely to be obtained in a single opening, and
  3. one can obtain multiples of the same item before obtaining a new unique item.

If the reader isn’t interested in the cool math, they are free to skip to the Results section where I’ve made some tables for various Mystery Items and how to calculate the number on their own, but I highly encourage everyone to read the cool math because, well, it’s cool.

Some Probability Background

What do I mean by independence? Essentially, opening one Mystery Item and getting some outcome shouldn’t change anything when opening another Mystery Item. This is important because it allows us to define some variables with neat properties. Let T be a random variable that represents “the number of Mystery Items one needs to open before obtaining a full set.” Random variables have a rigorous definition, but for our purpose just notice that T is

  • Variable: because one lucky individual could potentially obtain all 15 mini ST skins in just 15 openings, but it might also potentially take 100 or even 1000 openings before obtaining all 15, and
  • Random: because until one performs the experiment themselves, no one knows the value that T will take, and one can’t “solve for it” like one would expect for a usual variable.

We are interested in the quantity E(T), which refers to the “expected value of T” or the “expected number of Mystery Items one needs to open before obtaining a full set.” Expected value, or expectation, again has a rigorous definition but for our purposes we will think of it as

  • Expectation: If we repeat the experiment forever and keep averaging the results we get from our random variable, this average will eventually converge to some number. We call this number the expected value of the random variable. This neat result is actually called the Law of Large Numbers.

So, we know that we want E(T), but that seems hard to calculate directly. After all, we don’t know anything about the properties of T. However, we might be able to break the variable T down into more manageable variables. After all, if we are interested in the number of openings to get a full set, don’t we initially have to look at the openings to get the first unique item? After that, we also have to look at the number of openings to get the second unique item, and then the third, and so on. It’s natural to specify “unique” because we could obtain duplicate items. This is all the setup we need to start doing the meaningful work.

Collecting Coupons

Those who have taken a probability course likely already know this problem as the Coupon Collector’s Problem, and we will proceed exactly as the mythical coupon collector once did. Let T1 be a random variable that represents the “number of openings to get the first unique item,” T2 be a random variable that represents the “number of openings to get the second unique item,” and so on until Tn, where n is the number of unique items in the Mystery Item Pool (15 in the ST skins example). We have

E(T) = E(T1 + T2 + T3 + … + Tn).

There is a nice property of expected values that lets one break up the expected value of a sum into the sum of expected values. This property is called Linearity of Expectation, and using it here gives us

E(T) = E(T1 + T2 + T3 + … + Tn) = E(T1) + E(T2) + … + E(Tn).

Now we must ask the question: what do we know about the Ti’s (using i = 1, 2, …, n as an index here)? What makes them easier to work with than T? Let’s reword “openings” to “independent trials” and reword “obtain the i th unique item” as “success.” All of the Ti’s can basically be described as “the number of independent trials until success,” and this turns out to be a well-known probability distribution called the geometric distribution. Using the geometric distribution, one obtains that the expected number of trials until success is just the reciprocal of the probability of success per trial.

  • Geometric Expectation: One may already have an intuitive understanding of geometric expectation from playing with dice. We know that the probability of getting any face on a fair six-sided dice is 1/6. Let’s say someone wants to roll a 1 and is willing to keep rolling until they get their first 1. How many rolls should they expect to perform? Many would intuitively say six rolls. What if they instead want a 2? Well the probability of getting a 2 on any roll is still 1/6, so again they should expect to perform six rolls. Here, obtaining the number they want is the “success condition,” and for a fair six-sided dice the probability of success is always 1/6, so one should always expect six rolls before first obtaining any number (six being the reciprocal of 1/6).

So what is the probability of getting a unique item in a single opening for each of the Ti’s? Well let’s look at T1. We are guaranteed to get a unique item on the first opening because no items have been seen yet. Thus, the probability of getting a unique item on the first opening is 1, and E(T1) = 1/1 = 1. What about on the second opening? Well, we’ve already seen one of the items (the one from the first opening), so we don’t want to hit that one. There are n items total and on our second opening we only want to see n-1 of them (taking out the one from the first opening), so the probability of getting a unique item on the second opening is (n-1)/n and we have E(T2) = n/(n-1). Similarly, we can see that E(T3) = n/(n-2) and so on until E(Tn) = n/1 = n. Thus, we have

E(T) = E(T1 + T2 + T3 + … + Tn) = E(T1) + E(T2) + … + E(Tn) = 1 + n/(n-1) + n/(n-2) + … + n.

We can rewrite this with some factoring as

E(T) = 1 + n/(n-1) + n/(n-2) + … + n = n/n + n/(n-1) + n/(n-2) + … + n/1 = n * (1/n + 1/(n-1) + … + 1).

The sum 1/n + 1/(n-1) + 1/(n-2) + … + 1/2 + 1 is known as the n th Harmonic number, which we will denote Hn. We can conclude that

E(T) = n * Hn.

In other words, we need to open n * Hn Mystery Items before we can expect the full set of n items from the Mystery Item Pool. For the Mystery ST Skin Example, this comes out to be 15 * H15, which is approximately 50 openings. Computational engines such as Wolfram Alpha will usually recognize the n th Harmonic number if one uses the subscript notation “H_n.” If one would rather use a calculator over a computational engine, there is the nifty approximation

Hn ~ ln(n) + 0.5772156649.

Here, ln(n) is the natural logarithm of n and 0.5772156649 is the first few decimals of the Euler–Mascheroni constant. The math behind this approximation is a bit beyond the scope of this post, but I encourage any interested reader to check it out, as the result is very deep and has applications in fields such as number theory, analysis, and cryptography/cybersecurity. Note that this approximation will be worse for small values of n, which is the case for some Mystery Items that have a small number of options in their Item Pool (such as Mystery Keys). Using this approximation gives

E(T) ~ n * (ln(n) + 0.5772156649).

Results and Tables

DISCLAIMER 1: The usual coupon collecting result only applies if all items are equally likely to be obtained (uniformly distributed) from an opening. One will not be able to use this result on most purchases from the Mystery Shop, as they often have a higher probability of getting things such as Shards and a lower probability of getting ST items. However, there are still methods to find the expectation if the underlying distribution of items is non-uniform but known.

DISCLAIMER 2: Keep in mind that actually performing the expected number of openings does not guarantee you a full set. These Mystery Items are random, and one could potentially spend their entire RotMG careers opening loot chests without obtaining their desired full set.

Below is a table of some common Mystery Items, the number of options in their item pools (n), and their expected number of openings to get a full set (rounded to the nearest integer):

Mystery Item n E(T)
Rare Mystery Character Skin 27 105
Epic Mystery Character Skin 28 110
Legendary Mystery Character Skin 28 110
Rare Mystery Pet Skin 27 105
Epic Mystery Pet Skin 48 214
Legendary Mystery Pet Skin 51 230
Rare Mystery Key 8 22
Epic Mystery Key 13 41
Legendary Mystery Key 9 25
Mystery ST Skin 15 50
Mystery ST Chest 60 281
Mystery Stat Potion 8 22
Shard of the Doorwarden x 35 3 6
Shard of the Intern x 15 16 54

For one’s own calculation of expectations, feel free to use this Wolfram Alpha link and simply replace the 100 with the number of possible items from the Mystery Item Pool.

Other than the expected number of openings to get the full set, one may also be interested in how many openings are needed before one is 50%/90%/etc. to get a full set. These are known as the percentiles of T. Usually, percentiles are found from knowing the exact distribution of a random variable, but to my knowledge this is not yet known for the coupon collector’s problem. Luckily, we do know a neat limit theorem for the distribution of T thanks to Laplace, Erdos, and Renyi. As usual, this approximation is better for larger n, so for things such as Mystery Keys the results may not be very accurate. However, we can still calculate a table of percentiles (rounded to the nearest integer):

Mystery Item n 50% 60% 70% 80% 90% 95% 99%
Rare Mystery Character Skin 27 99 107 117 129 150 169 213
Epic Mystery Character Skin 28 104 112 122 135 156 176 222
Legendary Mystery Character Skin 28 104 112 122 135 156 176 222
Rare Mystery Pet Skin 27 99 107 117 129 150 169 213
Epic Mystery Pet Skin 48 203 218 235 258 294 328 407
Legendary Mystery Pet Skin 51 219 235 253 277 315 352 435
Rare Mystery Key 8 20 22 25 29 35 40 53
Epic Mystery Key 13 38 42 47 53 63 72 93
Legendary Mystery Key 9 23 26 29 33 40 47 61
Mystery ST Skin 15 46 51 56 63 74 85 110
Mystery ST Chest 60 268 286 308 336 381 424 522
Mystery Stat Potion 8 20 22 25 29 35 40 53
Shard of the Doorwarden x 35 3 4 5 6 8 10 12 17
Shard of the Intern x 15 16 50 55 61 68 80 92 118

Of note: if one performs E(T) number of openings, they are around 60-70% to obtain a full set. This can be seen if one plugs the mean n * Hn into the limit theorem.

For one’s own calculation of percentiles, feel free to use this Wolfram Alpha link and simply replace the 100 with the number of possible items from the Mystery Item Pool and the 0.5 with the percentile of interest.


#2

OB moment. I’m too smol brain to understand though :pensive: . I guess I’ll take your results as fact then…


#3

Cool! This is geat, I love how you explained everything. I have a question- if you are trying to collect all of the agent of oryx abilities, what would be the best way? I figured doing random draws for the first 8 and then doing the guaranteed draws for the last 8 would be best. Is my math correct here?

"Yeah, but the probability of that is low. It is in your favor to get 8 different abilities randomly and then use the 30 tokens. /u/Tmtoon literally did the math in his other comment.

In order to get 8 of the same ability it would be (1/16)^8 which comes out to be an absurdly low number.

You have a 100% chance to get an ability you didn’t already have on your first random roll. After that, you have a 15/16 chance to get one you didn’t get before. By the time you get 8, your chance is 50%, meaning on average it would take you 2 rolls to get an ability you don’t already have. At this point, it would be in your favor to use the 30 tokens (equivalent of 2 random rolls) to guarantee you get one you don’t already have."

Both my discrete math and statistics courses were hit by the break from Corona, and I really didn’t learn much from online school, so I’m not too confident in my answer


#4

This is a great question. Your math is correct in that by the time you obtain 8 items, your probability of getting a unique one on a random roll drops to 50%, which seems like a good place to stop rolling and start getting the guaranteed turn-ins. I think for the Agent of Oryx abilities, it’ll come down to a personal min-max analysis between how much you want to maximize the probability of getting a unique ability versus how much time you want to spend grinding for the shards, but overall I’d probably also stop rolling at 50% or even earlier.


#5

bruh istg we were discussing this in the realmafia server a few days ago lmao


#6

The ‘mistake’ April Fool ability items from The Machine are currently a set of 15, so the same calculation as for the Mystery ST skin will apply, if you’re trying to collect the full set of those.


#7

Well you have to be careful here since running a single Machine doesn’t guarantee you a Mistake Ability, so you’ll spend on average quite a bit more time than what I’ve calculated above. The correct example would be if there was some hypothetical lootbox that always gave you a random Mistake Ability upon opening (with a uniform 1/15 probability of getting any one).


#8

Broke: Using math to be a productive member of society.
Woke: Using math to find the number of lootboxes you need to buy in a pixel game.


#9

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.