Conundrum Affixes - A Statistical Approach

All discussion relevant to Countdown that is not too spoilerific. New members: come here first to introduce yourself. We don't bite, or at least rarely.
Post Reply
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Conundrum Affixes - A Statistical Approach

Post by Simon Myers »

For many who play Countdown, a common heuristic used to help solve conundrums is affix matching. In particular, looking for common prefixes (UN-, OVER-) and suffixes (-ING, -IEST).

I wanted to see to what extent this was useful and see whether conventional wisdom is accurate in this case. Using the set of Apterous conundrums (thanks Charlie) I wrote a program that looked at the number of conundrums that contain a given set of letters (say I, N, G) and compared this with the number of conundrums that began/ended with those letters (-ING in this case). So in effect it assesses the positives-to-decoys ratio.

I decided to post this here rather than in the Apterous subforum because I think the findings should have some use with respect to the show, with the caveat that heat game conundrums tend to have more common affixes than the whole Apterous set; finals (and CofC) game conundrums have less.

PREFIXES

Code: Select all

73% of 30   - COMM
61% of 184  - EX
42% of 55   - COMP
41% of 173  - OVER
41% of 66   - SQ
33% of 1269 - RES
28% of 149  - QU
24% of 96   - APP
23% of 1308 - UN
22% of 183  - UNDER
21% of 196  - SUB
20% of 227  - FOR
18% of 564  - CON
16% of 437  - PRO
15% of 144  - IMM
14% of 605  - DIS
12% of 199  - SUP
11% of 412  - OUT
10% of 3144 - RE
09% of 693  - PRE
07% of 342  - INTER
07% of 498  - MIS
Also of interest might be the single-letter "prefixes" J (45% of 121), F (37% of 944), P (34% of 1727), W (33% of 652) and B (32% of 1296). Interestingly the letter N is the first letter of only 3% of the 4317 conundrums in which it appears. L, E, I, K, and O all sit around the 7% mark.

SUFFIXES

Code: Select all

94% of 33   - FULLY
87% of 30   - OLOGY
76% of 1474 - ING
75% of 75   - IZED
72% of 53   - OUSLY
68% of 481  - NESS
62% of 798  - LY
58% of 60   - IFIED
50% of 2244 - ED
50% of 42   - WORK
48% of 48   - BOARD
44% of 315  - ABLE
42% of 206  - IZE
42% of 78   - ABLY
29% of 252  - LESS
27% of 812  - IEST
24% of 471  - TION
22% of 97   - IZER
21% of 482  - ATED
20% of 405  - IVE
14% of 3144 - ER
14% of 498  - ISM
12% of 1450 - ATE
10% of 396  - SION 
Powerful single-letter "suffixes" include Y (70% of 1286), G (51% of 2220), D (49% of 2719) and E (22% of 5803). Avoid U (0.08% of 2384), V (0.1% of 715), and I (0.3% of 5145).


The next step would probably be an attempt at identifying letter modifiers to a prefix or suffix (e.g. perhaps the presence of the letter F increases the chance of an -ING ending to 85%, etc).
Another idea I've had is to take, for example, the 24% of words that contain I, N, G but do not end -ING. Perhaps there's a useful set of suffixes to check for ING-decoys. Some research into a blended strategy (if one tries both -ABLY and -ABLE what are the chances of success?) could also prove fruitful. But that's for another day.
Last edited by Simon Myers on Fri Aug 07, 2009 2:01 am, edited 3 times in total.
User avatar
Jon Corby
Moral Hero
Posts: 8021
Joined: Mon Jan 21, 2008 8:36 am

Re: Conundrum Affixes - A Statistical Approach

Post by Jon Corby »

Brilliant work. That's really interesting.
User avatar
Jon O'Neill
Ginger Ninja
Posts: 4547
Joined: Tue Jan 22, 2008 12:45 am
Location: London, UK

Re: Conundrum Affixes - A Statistical Approach

Post by Jon O'Neill »

This is excellent. Well done!
User avatar
Charlie Reams
Site Admin
Posts: 9494
Joined: Fri Jan 11, 2008 2:33 pm
Location: Cambridge
Contact:

Re: Conundrum Affixes - A Statistical Approach

Post by Charlie Reams »

Nice work. Some surprising finds at the top, although I wonder how many of these patterns good conundrumists would be subconsciously aware of. Questions about methodology:

* What minimum (if any) did you set on how many words each set of letters had to appear in?
* Did you do some kind of prefix elimination? (e.g. I'd expect SQU- to make the list if SQ- did.)
Kevin Thurlow
Acolyte
Posts: 209
Joined: Mon Jan 21, 2008 11:08 am

Re: Conundrum Affixes - A Statistical Approach

Post by Kevin Thurlow »

That is interesting... So when I got "Neighbour", it was doubly interesting as it starts with "N" and the ING is mixed up... Perhaps someone who's more alert than me at the moment can say what ended with "V"?
Dinos Sfyris
Series 80 Champion
Posts: 2707
Joined: Mon Jan 21, 2008 10:07 am
Location: Sheffield

Re: Conundrum Affixes - A Statistical Approach

Post by Dinos Sfyris »

Kevin Thurlow wrote:That is interesting... So when I got "Neighbour", it was doubly interesting as it starts with "N" and the ING is mixed up... Perhaps someone who's more alert than me at the moment can say what ended with "V"?
Off the top of my head LEITMOTIV
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: Conundrum Affixes - A Statistical Approach

Post by Simon Myers »

Charlie Reams wrote: * What minimum (if any) did you set on how many words each set of letters had to appear in?
In the program itself I set no minimum. I had it go through the whole list once and record all "affixes" of up to 5 letters. So for COMPUTING, say, it recorded C, CO, COM, COMP, COMPU and G, NG, ING, TING, UTING (or incremented the counter for those that had already been seen before). Once I had my full list of these I then went through the whole list of affixes and for each conundrum in turn incremented a counter for words that included the letters of the affix in any place. So in the end I had a list of affixes, number of conundrums with that affix, and number of conundrums that contained the letters of the affix in any position.

My point is that all the raw data is available. For reporting I limited it to those affixes that have appeared 20 times or more in the set, which means they must represent at least 0.25% of all conundrums. If I didn't do this you would have some stuff like COMMU- at 100% of 7, MIDWI- at 100% of 4 and -ZZLED at 75% of 4.
Charlie Reams wrote: * Did you do some kind of prefix elimination? (e.g. I'd expect SQU- to make the list if SQ- did.)
Yes I did this by hand afterwards. The stats for SQU- and SQ- are identical as you might have assumed. There are lots of other things like -KING (34% of 101) and -LOGY (37% of 73) that make the list (209 prefixes and 249 suffixes occur at least 20 times in the set) but their inclusion above would probably obscure the useful results. There were no real surprises that I could see anyway, such as -NDING having a stronger correlation than -ING (it doesn't), so it would just make things confusing.


For those that are interested, I've uploaded the CSV file with the raw data here
User avatar
Jason Larsen
Postmaster General
Posts: 3902
Joined: Mon Jan 21, 2008 3:18 pm
Location: Seattle, Washington

Re: Conundrum Affixes - A Statistical Approach

Post by Jason Larsen »

That's very helpful, Clive!

Thank you!
Gavin Chipper
Post-apocalypse
Posts: 13277
Joined: Mon Jan 21, 2008 10:37 pm

Re: Conundrum Affixes - A Statistical Approach

Post by Gavin Chipper »

Jason Larsen wrote:That's very helpful, Clive!

Thank you!
Normally when you do that, it's at least in response to some post in the same thread! Now we have to search the whole forum for the relevant post!
User avatar
Jason Larsen
Postmaster General
Posts: 3902
Joined: Mon Jan 21, 2008 3:18 pm
Location: Seattle, Washington

Re: Conundrum Affixes - A Statistical Approach

Post by Jason Larsen »

Gavin, I knew what I was talking about!
User avatar
Kirk Bevins
God
Posts: 4923
Joined: Mon Jan 21, 2008 5:18 pm
Location: York, UK

Re: Conundrum Affixes - A Statistical Approach

Post by Kirk Bevins »

Jason Larsen wrote:Gavin, I knew what I was talking about!
But nobody else does, which is the idea of a forum.
Shaun Hegarty
Rookie
Posts: 51
Joined: Thu Jun 11, 2009 6:15 pm

Re: Conundrum Affixes - A Statistical Approach

Post by Shaun Hegarty »

I notice a lot of bio- conundrums, perhaps that could be included. Overall, though, and interesting set of statistics though.
User avatar
Jason Larsen
Postmaster General
Posts: 3902
Joined: Mon Jan 21, 2008 3:18 pm
Location: Seattle, Washington

Re: Conundrum Affixes - A Statistical Approach

Post by Jason Larsen »

Really?
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: Conundrum Affixes - A Statistical Approach

Post by Simon Myers »

Shaun Hegarty wrote:I notice a lot of bio- conundrums, perhaps that could be included. Overall, though, and interesting set of statistics though.
4% of 257.
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: Conundrum Affixes - A Statistical Approach

Post by Simon Myers »

As an accompaniment to the regular conundrums, here are the hyper conundrums:

PREFIXES

Code: Select all

55% of 119  - EX
42% of 57   - COMM
39% of 186  - OVER
27% of 1133 - UN
22% of 98   - COMP
22% of 90   - EXT
21% of 108  - SUPER
21% of 205  - UNDER
21% of 586  - DIS
20% of 113  - HYP
20% of 1265 - CO
19% of 170  - SUB
17% of 131  - APP
16% of 893  - PR
16% of 833  - CON
15% of 137  - MICRO
15% of 198  - SUP
14% of 1205 - DE
14% of 332  - COM
14% of 201  - DISC
11% of 175  - DEMO
11% of 254  - IMP
11% of 565  - PRO
11% of 2207 - RE
11% of 226  - UNP
11% of 317  - TRANS
11% of 186  - CONC
10% of 365  - CONS
Letters P (29% of 1270), W (26% of 160), D (26% of 1398), and F (25% of 511) are quite handy; N (2% of 3153), L (3% of 2403) and G (6% of 1510) are not.

SUFFIXES

Code: Select all

95% of 38   - LESSNESS
90% of 110  - IZING
86% of 37   - FULLNESS
77% of 26   - FULLY
76% of 33   - OLOGICAL
75% of 61   - ABILITY
73% of 71   - IZATION
72% of 50   - OUSNESS
71% of 118  - IZED
69% of 913  - LY
68% of 1157 - ING
67% of 36   - ISHNESS
66% of 140  - OUSLY
65% of 163  - ICALLY
64% of 56   - TIVELY
62% of 300  - ALLY
56% of 46   - OLOGIST
55% of 479  - NESS
53% of 105  - VELY
53% of 49   - LOGICAL
44% of 314  - ABLE
41% of 1205 - ED
40% of 700  - ATION
34% of 158  - ABLY
33% of 1012 - TION
17% of 236  - TORY
Letters worth looking at include Y (81% of 1144), G (52% of 1510) and D (37% of 1398) [next best is E with 16% of 3319]. Avoid I (0.1% of 3566), A (0.6% of 1924), X (0.7% of 130) and F (1% of 511).

A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
Paul Howe
Kiloposter
Posts: 1070
Joined: Tue Jan 22, 2008 2:25 pm

Re: Conundrum Affixes - A Statistical Approach

Post by Paul Howe »

Simon Myers wrote: A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
PORTMANTEAUX?
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: Conundrum Affixes - A Statistical Approach

Post by Simon Myers »

Paul Howe wrote:
Simon Myers wrote: A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
PORTMANTEAUX?
Indeed. Should've posted when I knew you weren't lurking around here Paul. Try the 4 that end in I, which are more challenging I think.
Paul Howe
Kiloposter
Posts: 1070
Joined: Tue Jan 22, 2008 2:25 pm

Re: Conundrum Affixes - A Statistical Approach

Post by Paul Howe »

Simon Myers wrote:
Paul Howe wrote:
Simon Myers wrote: A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
PORTMANTEAUX?
Indeed. Should've posted when I knew you weren't lurking around here Paul. Try the 4 that end in I, which are more challenging I think.
Ha, I think it's the first time I've logged in this weekend so you were quite unlucky!

I is harder, the only word that comes to mind atm is CARAVANSERAI? Sadly this will now be at the back of my mind for the rest of the evening.
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: Conundrum Affixes - A Statistical Approach

Post by Simon Myers »

Paul Howe wrote: I is harder, the only word that comes to mind atm is CARAVANSERAI?
No, that's not one of the four. I suppose you could also consider yourself unlucky; falling foul of Charlie's somewhat arbitrary judgement in selecting fair conundrums.
User avatar
Charlie Reams
Site Admin
Posts: 9494
Joined: Fri Jan 11, 2008 2:33 pm
Location: Cambridge
Contact:

Re: Conundrum Affixes - A Statistical Approach

Post by Charlie Reams »

Simon Myers wrote:
Paul Howe wrote: I is harder, the only word that comes to mind atm is CARAVANSERAI?
No, that's not one of the four. I suppose you could also consider yourself unlucky; falling foul of Charlie's somewhat arbitrary judgement in selecting fair conundrums.
It was far from arbitrary, I selected exactly the conundrums I thought Paul Howe wouldn't be able to guess.
Paul Howe
Kiloposter
Posts: 1070
Joined: Tue Jan 22, 2008 2:25 pm

Re: Conundrum Affixes - A Statistical Approach

Post by Paul Howe »

Charlie Reams wrote:
Simon Myers wrote:
Paul Howe wrote: I is harder, the only word that comes to mind atm is CARAVANSERAI?
No, that's not one of the four. I suppose you could also consider yourself unlucky; falling foul of Charlie's somewhat arbitrary judgement in selecting fair conundrums.
It was far from arbitrary, I selected exactly the conundrums I thought Paul Howe wouldn't be able to guess.
You did a good job!

On further reflection, -US to -I plurals look to be good candidates, so I'm going for:

STREPTOCOCCI (vague memories of being stumped by this on a hypernundrum attack)

and, less confidently,

STRATOCUMULI and CUMULOSTRATI, which at least have some google hits but could easily be cases of latinus malapropis
User avatar
Phil Reynolds
Postmaster General
Posts: 3329
Joined: Fri Oct 31, 2008 3:43 pm
Location: Leamington Spa, UK

Re: Conundrum Affixes - A Statistical Approach

Post by Phil Reynolds »

ELECTROPHORI?
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: Conundrum Affixes - A Statistical Approach

Post by Simon Myers »

Paul Howe wrote: STREPTOCOCCI
Yes.
Paul Howe wrote: STRATOCUMULI
Yes.
Paul Howe wrote: CUMULOSTRATI
No.
Phil Reynolds wrote:ELECTROPHORI
No.
Kevin Thurlow
Acolyte
Posts: 209
Joined: Mon Jan 21, 2008 11:08 am

Re: Conundrum Affixes - A Statistical Approach

Post by Kevin Thurlow »

Thanks Dinos




(for LEITMOTIV)
Paul Howe
Kiloposter
Posts: 1070
Joined: Tue Jan 22, 2008 2:25 pm

Re: Conundrum Affixes - A Statistical Approach

Post by Paul Howe »

Simon Myers wrote:
Paul Howe wrote:
Simon Myers wrote: A VFSMB to whoever correctly identifies the single hyper conundrum that ends in X.
PORTMANTEAUX?
Indeed. Should've posted when I knew you weren't lurking around here Paul. Try the 4 that end in I, which are more challenging I think.
Right, I've been thinking about this non-stop for the last two months and still don't know the answer. Time to spill the beans, Myers.
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: Conundrum Affixes - A Statistical Approach

Post by Simon Myers »

Paul Howe wrote:Right, I've been thinking about this non-stop for the last two months and still don't know the answer. Time to spill the beans, Myers.
Ah yes. In all fairness the two you didn't get were nigh on impossible. They are:
APPARATCHIKI
GASTROCNEMII
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: Conundrum Affixes - A Statistical Approach

Post by Simon Myers »

I've finally got around to doing the newly added apterous conundrums. There are only about 1100 of these so I've lowered the threshold for inclusion to 6 instead of 20. Due to the smaller sample there are a few differences between this and the main list. When the two lists are interpolated, the actual difference made to the main stats changes very little (INGs change by about 1% for example).

PREFIXES

Code: Select all

55% of 11  - QUA
43% of 35  - EX
42% of 26  - OVER
40% of 20  - QU
36% of 165 - UN
25% of 73  - PRO
22% of 55  - OUT
22% of 23  - UNDER
18% of 33  - SUB
14% of 85  - DIS
10% of 462 - RE
SUFFIXES

Code: Select all

80% of 10  - IZED
80% of 10  - ISHLY
72% of 11  - WOOD
71% of 160 - ING
63% of 11  - INGLY
61% of 54  - ABLE
56% of 39  - NESS
55% of 42  - IZE
53% of 17  - ALLY
47% of 43  - LESS
38% of 24  - FUL
33% of 18  - TORY
32% of 19  - BIRD
28% of 104 - IEST
27% of 462 - ER
22% of 64  - IVE
21% of 161 - EST
20% of 75  - ISM
16% of 58  - TION
15% of 61  - MAN
13% of 161 - IST
12% of 73  - ISH
12% of 52  - OUS
There are a whole bunch of 9 letter words that are conundrum-valid (have no anagrams, not plurals), around 3500, so sometime soon I might do the stats on those to see if there are any inherent biases in Charlie's conundrum selections.
Post Reply