Page 1 of 1

Calling statisticians

Posted: Sat Nov 29, 2008 3:28 pm
by Charlie Reams
Do any of our resident statisticians have a smart guess for what sort of distribution this data might be drawn from? I've clipped off the long tail but it approaches zero pretty steadily.

Image

Re: Calling statisticians

Posted: Sat Nov 29, 2008 3:55 pm
by Paul Howe
Log-normal maybe?

They kind of look similar. I'm not too strong on stats so that's about the most insight I can offer.

Re: Calling statisticians

Posted: Sat Nov 29, 2008 4:03 pm
by Ben Wilson
Does kinda looked like a skewed normal to me too, but my stats are so rusty it's unreal.

Re: Calling statisticians

Posted: Sat Nov 29, 2008 4:06 pm
by Charlie Reams
Log-normal seems very plausible based on the source. It's the data on how long it takes people to solve conundrums on Apterous, if you're interested. I'm doing something interesting with this data which I'll share at some point.

Re: Calling statisticians

Posted: Sat Nov 29, 2008 4:14 pm
by Paul Howe
Just had an idea that it might be an Erlang distribution, but you'd expect that to have a flatter peak given the length of the tail, and I can't see any reason that conundrum times would generate Erlang data now that's been revealed as the source.

Re: Calling statisticians

Posted: Sat Nov 29, 2008 4:22 pm
by Charlie Reams
It does look a bit Erlangy (in fact now you've said that I realise that's what was making it look familiar in the first place) but I know that human reaction times are distributed log-normal so it seems possible that other brain activities would be similar. I'll do some tests and find out.

Re: Calling statisticians

Posted: Sat Nov 29, 2008 4:43 pm
by Kai Laddiman
My ranking on Apterous before and after I cheated?

Re: Calling statisticians

Posted: Sun Nov 30, 2008 12:40 pm
by Frank Rodolf
Charlie Reams wrote:It does look a bit Erlangy (in fact now you've said that I realise that's what was making it look familiar in the first place) but I know that human reaction times are distributed log-normal so it seems possible that other brain activities would be similar. I'll do some tests and find out.
And today's Daily Duel was one of those tests? ;)

Re: Calling statisticians

Posted: Sun Nov 30, 2008 2:05 pm
by Kirk Bevins
This is the best off topic thread yet - love the curiousity of Charlie and love the responses.

Re: Calling statisticians

Posted: Sun Nov 30, 2008 2:27 pm
by Gavin Chipper
Frank Rodolf wrote:
Charlie Reams wrote:It does look a bit Erlangy (in fact now you've said that I realise that's what was making it look familiar in the first place) but I know that human reaction times are distributed log-normal so it seems possible that other brain activities would be similar. I'll do some tests and find out.
And today's Daily Duel was one of those tests? ;)
Would that work? Without any competition from any opposition, people are more likely to check and double check their answers. Unless Charlie has done that thing that was talked about where only the fastest gets the points. I'll do the duel now...

Re: Calling statisticians

Posted: Sun Nov 30, 2008 2:58 pm
by Charlie Reams
Frank Rodolf wrote:
Charlie Reams wrote:It does look a bit Erlangy (in fact now you've said that I realise that's what was making it look familiar in the first place) but I know that human reaction times are distributed log-normal so it seems possible that other brain activities would be similar. I'll do some tests and find out.
And today's Daily Duel was one of those tests? ;)
You overestimate my organisation. That duel was lined up ages ago. I just meant statistical tests on the existing data.

Re: Calling statisticians

Posted: Sun Nov 30, 2008 8:07 pm
by Howard Somerset
It has a vague likeness to a Poisson Distribution with a mean of around 3 to 5, though it doesn't tail off quite quick enough. See the mean 4 example here.

Re: Calling statisticians

Posted: Sun Nov 30, 2008 8:31 pm
by Michael Wallace
My first thought was a gamma, but log-normal looks about right too (depending on the parameters, obviously). If I wasn't in the middle of playing computer games I might think about the actual problem to try and decide which distributions are most appropriate.

Also, these data, not this data, n00b :evil:

Re: Calling statisticians

Posted: Sun Nov 30, 2008 11:39 pm
by Charlie Reams
Michael Wallace wrote:My first thought was a gamma, but log-normal looks about right too (depending on the parameters, obviously). If I wasn't in the middle of playing computer games I might think about the actual problem to try and decide which distributions are most appropriate.
Log-normal fits the data fairly well, but I'm still open to better suggestions. If anyone wants the raw data to play with then let me know.
Michael Wallace wrote:Also, these data, not this data, n00b :evil:
I'll start saying "these data" when you start saying "one panino please".

Re: Calling statisticians

Posted: Mon Dec 01, 2008 12:41 am
by Michael Wallace
Charlie Reams wrote:I'll start saying "these data" when you start saying "one panino please".
The wife and I make a point of saying pannino, not pannini, so nyer.

(not that I can remember ever asking for a pannino (or pannini))

Re: Calling statisticians

Posted: Mon Dec 01, 2008 1:14 am
by Kirk Bevins
Michael Wallace wrote:
Charlie Reams wrote:I'll start saying "these data" when you start saying "one panino please".
The wife and I make a point of saying pannino, not pannini, so nyer.

(not that I can remember ever asking for a pannino (or pannini))
Please try and spell them correctly. I always ask "do you do panini?" which sounds a bit odd and they then say "yes, we have bacon paninis, or cheese paninis". "I'll have a bacon panino please". I then had one woman say "sorry?" and I just said "a bacon one please" out of semi-embarrassment. Why should I get embarrassed by being correct?

Re: Calling statisticians

Posted: Mon Dec 01, 2008 1:26 am
by Michael Wallace
Kirk Bevins wrote:The wife and I make a point of saying pannino, not pannini, so nyer.

(not that I can remember ever asking for a pannino (or pannini))
Please try and spell them correctly.[/quote]

Weird - I thought it was panini and the wife corrected me, and then I (somehow) thought that the forum spellchecker agreed with him, but clearly my eye was playing tricks on me.

Basically it wasn't my fault >_>

Re: Calling statisticians

Posted: Mon Dec 01, 2008 2:12 am
by Ben Hunter
Kirk Bevins wrote:
Michael Wallace wrote:
Charlie Reams wrote:I'll start saying "these data" when you start saying "one panino please".
The wife and I make a point of saying pannino, not pannini, so nyer.

(not that I can remember ever asking for a pannino (or pannini))
Please try and spell them correctly. I always ask "do you do panini?" which sounds a bit odd and they then say "yes, we have bacon paninis, or cheese paninis". "I'll have a bacon panino please". I then had one woman say "sorry?" and I just said "a bacon one please" out of semi-embarrassment. Why should I get embarrassed by being correct?
Correctness is a matter of context when it comes to language, though I'll probably use 'panino' in future, purely as a pretext for charming banter with attractive sandwich shop girls.

Re: Calling statisticians

Posted: Mon Dec 01, 2008 11:04 am
by Michael Wallace
Ben Hunter wrote:Correctness is a matter of context when it comes to language, though I'll probably use 'panino' in future, purely as a pretext for charming banter with attractive sandwich shop girls.
I don't know about anyone else, but I for one am certainly interested to find out whether your panino exploits get you anywhere...

Re: Calling statisticians

Posted: Mon Dec 01, 2008 11:24 am
by Jon Corby
It looks like my pyjama bottoms in the morning 8-)

Re: Calling statisticians

Posted: Mon Dec 01, 2008 11:46 am
by Charlie Reams
Ben Hunter wrote: Correctness is a matter of context when it comes to language, though I'll probably use 'panino' in future, purely as a pretext for charming banter with attractive sandwich shop girls.
I actually did this last time I was in Clowns, a cafe in Cambridge which is run by Italians. The ASSG (attractive sandwich shop girl) said "ohh, very good Italian" and smiled at me. It wasn't quite the full sex I was expecting, but still rewarding.

Re: Calling statisticians

Posted: Mon Dec 01, 2008 3:51 pm
by Michael Wallace
So I was thinking about this on the tube this morning. My main thoughts were about what factors are going to affect solving time, and then once you have these you can try and fit a model.

The two most obvious ones are player ability and conundrum difficulty. The first is easy to factor into our model, thanks to ratings (give or take the various problems with the system), the second one less so. I don't know how many conundrums have been given in multiple games, but that's one option for trying to assess their difficulty. Another might be some statistic for each conundrum on how often the word is used in English (although that's probably not easily available).

There are obviously going to be heaps of other things that influence the solving time, such as whether it's crucial (I would imagine people might be trying less hard if they've already won), or if the conundrum is needed to make a game a particularly good score. I doubt the second has much of an influence, and I'm not really convinced the first would either. There are probably other factors too, though.

But yeah, I'd start with data on the first two, assuming there's some extra information available to assess the conundrum difficulty, and then stick them into a model, maybe Time ~ Gamma(a,b) where a and b are functions of those factors. More interesting though would probably be using these data to get an assessment of the difficulty of conundrums, which is probably easier to do anyway.

Re: Calling statisticians

Posted: Mon Dec 01, 2008 3:54 pm
by Charlie Reams
Michael Wallace wrote: But yeah, I'd start with data on the first two, assuming there's some extra information available to assess the conundrum difficulty, and then stick them into a model, maybe Time ~ Gamma(a,b) where a and b are functions of those factors. More interesting though would probably be using these data to get an assessment of the difficulty of conundrums, which is probably easier to do anyway.
That's exactly what I'm doing, although it's harder than it sounds because, with over 8000 conundrums, the data for any given conundrum is pretty sparse. There are some other complications too, which I'll share when I write up the results some time next week.