ptc24: (tickybox)
Add MemoryShare This Entry
posted by [personal profile] ptc24 at 08:24am on 06/05/2010
...[personal profile] damerell and [livejournal.com profile] kht on 0.9. My calculations typically came out around 0.903 or so. Figures are from here

>>>sum([random.normalvariate(69.3, 2.92) > random.normalvariate(64.1, 2.75) for i in range(10000000)])
9025814


The sample size is too small to draw firm conclusions, but there seems to be a trend (with exceptions) that men thought the probability was smaller than women did. Also both people who commented with some sort of reasoning went for 0.7, really quite on the low side.
There are 10 comments on this entry. (Reply.)
simont: A picture of me in 2016 (Default)
posted by [personal profile] simont at 08:28am on 06/05/2010
Given those figures, the exact answer: standard deviations of 2.92 and 2.75 translate into variances of 8.5264 and 7.5625, so that the difference between those two distributions is a normal distribution with mean 5.2 and variance 16.0889. The latter translates back into a standard deviation of 4.011. So the answer we're after is Φ(5.2 / √16.0889), which works out to 0.90258... .
simont: A picture of me in 2016 (Default)
posted by [personal profile] simont at 08:33am on 06/05/2010
Actually, given to how many digits that matched your Monte Carlo answer, I should have given a few more! 0.902581732473... .
simont: A picture of me in 2016 (Default)
posted by [personal profile] simont at 08:33am on 06/05/2010
And my error, as best I can recall from the thoroughly non-rigorous estimation process I did in my head, was to underestimate the standard deviations by some way.
ptc24: (Default)
posted by [personal profile] ptc24 at 08:41am on 06/05/2010
Hmmm, surely underestimating the standard deviations would have given a higher probability, not a lower one? Do you mean overestimating the standard deviations, or underestimating the difference in means?
simont: A picture of me in 2016 (Default)
posted by [personal profile] simont at 08:42am on 06/05/2010
I mean (a) overestimating the standard deviations, and (b) that I should go back to bed.
emperor: (Default)
posted by [personal profile] emperor at 10:52am on 06/05/2010
> pnorm(0,64.1-69.3,sqrt(2.92^2+2.75^2))
[1] 0.9025817

Saves you all that simulation :)
simont: A picture of me in 2016 (Default)
posted by [personal profile] simont at 11:06am on 06/05/2010
On the other hand, the simulation approach is worthwhile as a check, because it's easier to be sure there weren't any cock-ups like forgetting to square or square-root something.

(Actually, I'm faintly suspicious of the Monte Carlo result quoted above. I haven't actually done the chi-squared calculations, but it's so close to the correct answer that my instinct is to suspect a randomness problem!)
ptc24: (Default)
posted by [personal profile] ptc24 at 12:38pm on 06/05/2010
Yeah. That many significant figures... Let's try a second run

Python 2.5.2 (r252:60911, Jun  6 2008, 23:32:27)
[GCC 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> sum([random.normalvariate(69.3, 2.92) > random.normalvariate(64.1, 2.75) for i in range(10000000)])
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'random' is not defined
>>> import random
>>> sum([random.normalvariate(69.3, 2.92) > random.normalvariate(64.1, 2.75) for i in range(10000000)])
9026585
>>> sum([random.normalvariate(69.3, 2.92) > random.normalvariate(64.1, 2.75) for i in range(10000000)])
9026391
>>> sum([random.gauss(69.3, 2.92) > random.gauss(64.1, 2.75) for i in range(10000000)])
9025517


Hmmm, looks like a co-incidence to me.

Quite why the python module has two functions for generating random numbers from a gaussian I don't know...
Edited (Removed an accidentally duplicated line) Date: 2010-05-06 12:39 pm (UTC)
simont: A picture of me in 2016 (Default)
posted by [personal profile] simont at 12:49pm on 06/05/2010
Ooh, I hadn't noticed before that DW notifications about comments having been edited include a comment saying what was changed. That's pretty cool.

normalvariate vs gauss: poking at the source code it looks as if they use different generation methods, so presumably there's some tradeoff of performance against accuracy which power users might plausibly care about. The documentation doesn't bother to say what, though; it only mentions that gauss() "is slightly faster".

(Also, gauss() is apparently undocumentedly non-thread-safe! It uses a method that generates two independent normal variates in one go, so it caches the second one and returns it from the next call. The comments in the code mention that you should therefore do some locking if you want to use gauss() in multiple threads on the same underlying random state, but somehow that doesn't seem to have found its way into the docs. Good grief.)
ptc24: (Default)
posted by [personal profile] ptc24 at 12:57pm on 06/05/2010
it caches the second one and returns it from the next call

A while ago it had the problem that this caching didn't pay attention to re-seeding the RNG, and so you could do

>>> random.seed(10)
>>> random.gauss(0,1)
-0.95371700806333715
>>> random.seed(10)
>>> random.gauss(0,1)
-0.95371700806333715


and get a different answer the second time - an alternating pattern of two answers in fact. Evidently this has been fixed.

March

SunMonTueWedThuFriSat
            1
 
2
 
3
 
4
 
5
 
6 7
 
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 
31