## Sunday, September 18, 2011

### Straight From Probability

Let's say I randomly create a million bags.  Half of these bags contain 100 white marbles.  The other half contain 90 white marbles and 10 red marbles.  I then give you one of these bags, chosen at random, and offer you a prize if you can correctly guess whether it is an all-white-marbles (W) bag or a some-red-marbles (R) bag.

Of course, if you aren't allowed to look at the bag, you'll only have a 50% chance of guessing correctly.  But I'm a sporting guy, so I'm going to give you a little handicap.  I'm going to let you look at a single marble from the bag before you have to guess.  If you're lucky enough to pull out a red marble, you know exactly what to guess.  What what if you pull out a white marble?  What should you guess then?  What if I let you pull out two marbles?  Fifty?

This is a simplified version of what science faces every day.  For example, suppose a biologist is asked whether there are any white ravens.  The biologist has several hundred documented sightings of black ravens, and not a single documented sighting of any white ravens.  Now it could be that our universe is one of those bags where most but not all of the ravens are black, and we just haven't gotten our hands on a white one.  But just how likely is such a scenario?  How likely is it that the uniformity of our observations doesn't actually reflect a uniformity in nature?

Understanding why the GIH works is as simple as understanding why it's better to guess white after pulling a white marble from the bag.  It's all about working with limited knowledge in a way that maximizes your chances of being right.  And in order to do that, you have to update your probability distribution whenever you gain new information.

Let's take it step by step.  When I first give you the bag, you know there's a 50% chance that I handed you a W bag.  We're going to call this the prior probability because we assign it before we make any observations.

Now let's say you get to pull one marble out of the bag, and what do you know, it's white!  Now even if I had given you an R bag, this might have occured.  But it's less likely for you to draw a white marble from an R bag than to draw a white marble from a W bag.  So the next step is to calculate what I'll call the un-normalized adjustment.

You know you have a random bag, and now you also know that one random marble in that bag was white.  What are the odds of getting an R bag and then drawing a white marble at random?  50% for the R bag times 90% for the draw = 45%.  What are the odds of getting a W bag and then drawing a white marble at random?  50% for the W bag times 100% for the draw = 50%.  You'll notice this is un-normalized because the total is 95%, not 100%.  The other 5% lies in the possible but didn't-happen scenario of getting an R bag and drawing a red marble at random.

Now that we have the un-normalized adjustment, we have to normalize it.  The simplest way to normalize is to divide each result by the total of 95% = 0.95.  So we get 52.6% for W, and 47.4% for R, which is normalized.  This is the updated probability, based on the new information.  Now, if all this fancy math I've done is correct, then what we have here is a prime example of the Generalized Inductive Hypothesis.  Given a bag with an unknown marble distribution, the observation of a white marble has made the chance of the bag being a W bag increase in comparison to the chance of it being an R bag.

To understand why the mathematics I've used is correct, let's think about a slightly different scenario.  Instead of getting a random bag, and then updating on the knowledge that a random marble from that bag was white, let's suppose the draw happens first.  I take my million bags and draw a random marble from each one.  I keep only those bags from which I drew a white marble, and reject any from which I drew a red marble.  On average, I will have 450 thousand of the R bags and all 500 thousand of the W bags.  Then from these remaining bags, I give you one at random.  Clearly, there is now a 450k/950k = 47.4% chance that the bag I gave you is an R bag, and a 500k/950k = 52.6% chance that the bag I gave you is a W bag.  Exactly the same as what we calculated before!

So what we calculated in the first scenario is the odds of getting a random bag from which one white marble has been drawn.  But we've calculated this even though we didn't know the result of the draw until after we got the bag.  This is what I meant by updating your distribution.  Before the draw, we just had a random bag.  But after the draw, we had a random bag that had produced a white marble.  We then updated our probability distribution to reflect this new knowledge.

And if I let you draw a second marble, and it too is white, then the probabilities will become even more extreme.  While it takes 91 draws to be absolutely certain you have a W bag, it takes just 20 white marbles to be 99% sure.  That's merely a fifth of the bag!  So there's no need for biologists to go looking for every single raven.  Observing just a third of the raven population will give us a very high confidence level in the "all ravens are black" hypothesis.  It all follows straight from the mathematics of probability theory.

And there is no "principle of uniformity" either.  In fact, since the order of information doesn't affect the result, there's no need for a time axis at all!  Induction is more general than that.  We can apply it in a timeless scenario.  We can even apply it across time.  We can look at observations from the past and use them to conclude temporal uniformity.  Because it's facts about probability that form the foundation of induction, not temporal uniformity.

And finally, even the math isn't all that important.  It doesn't matter that I chose 1 million or 90% or a 50/50 prior or any of that.  What matters is the upper bound.  If we draw a white marble, we ask "Given hypothesis H, what are the odds of drawing a white marble?" These odds are at their highest when H is "all the marbles in the bag are white."  It can't get any higher.  You can't have more white marbles in the bag than you have marbles in the bag.  You can't have a greater than 100% chance of getting a white marble.  Something can't happen more often than always.  It's a tautology.  If we have a W bag, then all the marbles are white, so we have to draw a white marble.  So "W and we draw a white marble" can't be any less likely than the prior for W.  But for any other kind of bag, then we can draw a non-white marble.  It's possible, so it must have some chance of occuring.  Thus "R and we draw a white marble" has to be less likely than the prior for R.  R's (un-normalized) liklihood has gone down, while W's has remained the same.  Thus W/R has gone up.  All because "we draw a white marble" is more likely for a W bag than for an R bag, or for any other bag.  That's the GIH.

#### 1 comment:

1. Nice! I'm currently working my way through Probability Theory: The Logic of Science, by E.T. Jaynes. He connects Bayesian/GIH reasoning to both formal logic and scientific inference. Super-highly recommend it for anyone interested in this stuff.