S = log2 (1/6625000000) = 32.6 bits of information.Ĭonservatively, we can round that up to 33 bits. ![]() Knowing that I live in Moscow: ΔS = -log2 (10524400/6,625,000,000) = 9.30 bits How much entropy is needed to identify someone?Īs of 2007, identifying someone from the entire population of the planet required: As of 2007, there were 21,733 people living in the 90210 area, only 452 in 40203, and around 6.625 billion on the planet. For instance, the likelihood that an unknown person's ZIP code is 90210 (Beverley Hills, California) is different to the likelihood that their ZIP code would be 40203 (part of Louisville, Kentucky). 4 The calculation can also be applied to facts which have non-uniform likelihoods. In the examples above, each starsign and birthday was assumed to be equally likely. Note that if you combine several facts together, you might not learn anything new for instance, telling me someone's starsign doesn't tell me anything new if I already knew their birthday. Starsign: ΔS = - log2 Pr(STARSIGN=capricorn) = - log2 (1/12) = 3.58 bits of informationīirthday: ΔS = - log2 Pr(DOB=2nd of January) = -log2 (1/365) = 8.51 bits of information Let's apply the formula to a few facts, just for fun: Where ΔS is the reduction in entropy, measured in bits, 2 and Pr(X=x) is simply the probability that the fact would be true of a random person. When we learn a new fact about a person, that fact reduces the entropy of their identity by a certain amount. 1īecause there are around 7 billion humans on the planet, the identity of a random, unknown person contains just under 33 bits of entropy (two to the power of 33 is 8 billion). Adding one more bit of entropy doubles the number of possibilities. Intuitively you can think of entropy being generalization of the number of different possibilities there are for a random variable: if there are two possibilities, there is 1 bit of entropy if there are four possibilities, there are 2 bits of entropy, etc. ![]() That quantity is called entropy, and it's often measured in bits. There is a mathematical quantity which allows us to measure how close a fact comes to revealing somebody's identity uniquely. But it turns out that if I know these three things about a person, I could probably deduce their identity! Each of the facts is partially identifying. If all I know is their gender, I don't know who they are. If all I know is their date of birth, I don't know who they are. If all I know about a person is their ZIP code, I don't know who they are. If we ask whether a fact about a person identifies that person, it turns out that the answer isn't simply yes or no.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |