ptc24: (tickybox)
Peter ([personal profile] ptc24) wrote2010-05-06 08:24 am

And the winners are...

...[personal profile] damerell and [livejournal.com profile] kht on 0.9. My calculations typically came out around 0.903 or so. Figures are from here

>>>sum([random.normalvariate(69.3, 2.92) > random.normalvariate(64.1, 2.75) for i in range(10000000)])
9025814


The sample size is too small to draw firm conclusions, but there seems to be a trend (with exceptions) that men thought the probability was smaller than women did. Also both people who commented with some sort of reasoning went for 0.7, really quite on the low side.
simont: A picture of me in 2016 (Default)

[personal profile] simont 2010-05-06 08:28 am (UTC)(link)
Given those figures, the exact answer: standard deviations of 2.92 and 2.75 translate into variances of 8.5264 and 7.5625, so that the difference between those two distributions is a normal distribution with mean 5.2 and variance 16.0889. The latter translates back into a standard deviation of 4.011. So the answer we're after is Φ(5.2 / √16.0889), which works out to 0.90258... .
simont: A picture of me in 2016 (Default)

[personal profile] simont 2010-05-06 08:33 am (UTC)(link)
Actually, given to how many digits that matched your Monte Carlo answer, I should have given a few more! 0.902581732473... .
simont: A picture of me in 2016 (Default)

[personal profile] simont 2010-05-06 08:33 am (UTC)(link)
And my error, as best I can recall from the thoroughly non-rigorous estimation process I did in my head, was to underestimate the standard deviations by some way.
simont: A picture of me in 2016 (Default)

[personal profile] simont 2010-05-06 08:42 am (UTC)(link)
I mean (a) overestimating the standard deviations, and (b) that I should go back to bed.
emperor: (Default)

R :-)

[personal profile] emperor 2010-05-06 10:52 am (UTC)(link)
> pnorm(0,64.1-69.3,sqrt(2.92^2+2.75^2))
[1] 0.9025817

Saves you all that simulation :)
simont: A picture of me in 2016 (Default)

[personal profile] simont 2010-05-06 11:06 am (UTC)(link)
On the other hand, the simulation approach is worthwhile as a check, because it's easier to be sure there weren't any cock-ups like forgetting to square or square-root something.

(Actually, I'm faintly suspicious of the Monte Carlo result quoted above. I haven't actually done the chi-squared calculations, but it's so close to the correct answer that my instinct is to suspect a randomness problem!)
simont: A picture of me in 2016 (Default)

[personal profile] simont 2010-05-06 12:49 pm (UTC)(link)
Ooh, I hadn't noticed before that DW notifications about comments having been edited include a comment saying what was changed. That's pretty cool.

normalvariate vs gauss: poking at the source code it looks as if they use different generation methods, so presumably there's some tradeoff of performance against accuracy which power users might plausibly care about. The documentation doesn't bother to say what, though; it only mentions that gauss() "is slightly faster".

(Also, gauss() is apparently undocumentedly non-thread-safe! It uses a method that generates two independent normal variates in one go, so it caches the second one and returns it from the next call. The comments in the code mention that you should therefore do some locking if you want to use gauss() in multiple threads on the same underlying random state, but somehow that doesn't seem to have found its way into the docs. Good grief.)