ptc24 | And the winners are...

And the winners are...

...

kht on 0.9. My calculations typically came out around 0.903 or so. Figures are from here

>>>sum([random.normalvariate(69.3, 2.92) > random.normalvariate(64.1, 2.75) for i in range(10000000)])
9025814

The sample size is too small to draw firm conclusions, but there seems to be a trend (with exceptions) that men thought the probability was smaller than women did. Also both people who commented with some sort of reasoning went for 0.7, really quite on the low side.

Flat | Top-Level Comments Only

Given those figures, the exact answer: standard deviations of 2.92 and 2.75 translate into variances of 8.5264 and 7.5625, so that the difference between those two distributions is a normal distribution with mean 5.2 and variance 16.0889. The latter translates back into a standard deviation of 4.011. So the answer we're after is Φ(5.2 / √16.0889), which works out to 0.90258... .

Actually, given to how many digits that matched your Monte Carlo answer, I should have given a few more! 0.902581732473... .

And my error, as best I can recall from the thoroughly non-rigorous estimation process I did in my head, was to underestimate the standard deviations by some way.

Hmmm, surely underestimating the standard deviations would have given a higher probability, not a lower one? Do you mean overestimating the standard deviations, or underestimating the difference in means?

I mean (a) overestimating the standard deviations, and (b) that I should go back to bed.

> pnorm(0,64.1-69.3,sqrt(2.92^2+2.75^2))
[1] 0.9025817

Saves you all that simulation :)

On the other hand, the simulation approach is worthwhile as a check, because it's easier to be sure there weren't any cock-ups like forgetting to square or square-root something.

(Actually, I'm faintly suspicious of the Monte Carlo result quoted above. I haven't actually done the chi-squared calculations, but it's so close to the correct answer that my instinct is to suspect a randomness problem!)

Yeah. That many significant figures... Let's try a second run

Python 2.5.2 (r252:60911, Jun  6 2008, 23:32:27)
[GCC 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> sum([random.normalvariate(69.3, 2.92) > random.normalvariate(64.1, 2.75) for i in range(10000000)])
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'random' is not defined
>>> import random
>>> sum([random.normalvariate(69.3, 2.92) > random.normalvariate(64.1, 2.75) for i in range(10000000)])
9026585
>>> sum([random.normalvariate(69.3, 2.92) > random.normalvariate(64.1, 2.75) for i in range(10000000)])
9026391
>>> sum([random.gauss(69.3, 2.92) > random.gauss(64.1, 2.75) for i in range(10000000)])
9025517

Hmmm, looks like a co-incidence to me.

Quite why the python module has two functions for generating random numbers from a gaussian I don't know...

Edited (Removed an accidentally duplicated line) 2010-05-06 12:39 (UTC)

Ooh, I hadn't noticed before that DW notifications about comments having been edited include a comment saying what was changed. That's pretty cool.

normalvariate vs gauss: poking at the source code it looks as if they use different generation methods, so presumably there's some tradeoff of performance against accuracy which power users might plausibly care about. The documentation doesn't bother to say what, though; it only mentions that gauss() "is slightly faster".

(Also, gauss() is apparently undocumentedly non-thread-safe! It uses a method that generates two independent normal variates in one go, so it caches the second one and returns it from the next call. The comments in the code mention that you should therefore do some locking if you want to use gauss() in multiple threads on the same underlying random state, but somehow that doesn't seem to have found its way into the docs. Good grief.)

it caches the second one and returns it from the next call

A while ago it had the problem that this caching didn't pay attention to re-seeding the RNG, and so you could do

>>> random.seed(10)
>>> random.gauss(0,1)
-0.95371700806333715
>>> random.seed(10)
>>> random.gauss(0,1)
-0.95371700806333715

and get a different answer the second time - an alternating pattern of two answers in fact. Evidently this has been fixed.

Flat | Top-Level Comments Only

And the winners are...

no subject

no subject

no subject

no subject

no subject

R :-)

no subject

no subject

no subject

no subject