How many words would you need to know...

Discuss anything interesting but not remotely Countdown-related here.

Moderator: Jon O'Neill

Post Reply
User avatar
Jon Corby
Moral Hero
Posts: 8021
Joined: Mon Jan 21, 2008 8:36 am

How many words would you need to know...

Post by Jon Corby »

... in order to understand the entire dictionary?

(This question has been nicked from another forum I frequent, and I suggested that with the cross-section of word nerds and stats geeks we have here, we might be able to do something interesting with it)

Essentially the whole dictionary is just a load of cyclical references; every single word is defined using other words, which are themselves defined elsewhere in the dictionary, and so on and so forth. So if you look up word [1] and it uses word [2] that you don't (yet) understand, which in its definition uses word [3] that you don't (yet) understand, but on reading the definition for word [3] you already understand all the words used, that avenue is closed. You now fully understand words [1], [2] and [3], and won't need to look them up if you encounter them in definitions elsewhere.

So how many words would you need prior knowledge of in order to be able to (eventually) completely 'understand' the dictionary?
User avatar
Charlie Reams
Site Admin
Posts: 9494
Joined: Fri Jan 11, 2008 2:33 pm
Location: Cambridge
Contact:

Re: How many words would you need to know...

Post by Charlie Reams »

Layman's overview: This is a really nice question.

Mathematician's overview: Can be easily restated as a graph theory problem, viz. for a directed graph (V,E) find S the minimal subset of V such that all paths from all vertices in (V,E) reach some element of S. My intuition is: find the longest cycle, output some vertex from this cycle as "need to know", delete it and repeat. This seems sensible but I can't prove that it's optimal.

If anyone has a dictionary file with definitions (Sid?) I'd be interested to give it a go.
User avatar
Jon Corby
Moral Hero
Posts: 8021
Joined: Mon Jan 21, 2008 8:36 am

Re: How many words would you need to know...

Post by Jon Corby »

I haven't a fucking clue what you're on about, but I hoped you'd say something like that :)

Have you got a "gut-feel" for the number? The most popular guess seemed to be "quite few, less than a hundred" which seems ludicrously low to me, but I haven't really the foggiest.
User avatar
Charlie Reams
Site Admin
Posts: 9494
Joined: Fri Jan 11, 2008 2:33 pm
Location: Cambridge
Contact:

Re: How many words would you need to know...

Post by Charlie Reams »

The algorithm doesn't give much of a gut feel, because it works on arbitrarily structured dictionaries, and of course real dictionaries are far from arbitrary, e.g. a lot of definitions will depend on the word "the", whereas hardly any will depend on the word "emu". I would guess more than a few hundred though, because you can bet there are lots of obscure cycles, like "snurg: an animal that eats flurpweed, flurpweed: an African flower eaten by snurgs", and you have to know at least one word in each of those short cycles, as well as some basic key words like "a".
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: How many words would you need to know...

Post by Simon Myers »

Obviously the answer to the question is highly dependent on the richness of the language used in the dictionary. I'd imagine you'd get a different outcome if you were to compare a concise dictionary with the ODE or OED.

@Charlie - The Zyzzyva website has an annotated CSW (and OWL2, though I think they're very similar) dictionary file with simple definitions. These were hand made by an enthusiast though I think, and tend to be very concise, so I'm not sure how generalisable the results you'd get would be.
If desired, I can provide WordNet definitions in a similar text format, though these tend to be very verbose and (I think) come from disparate sources.
Jon Corby wrote: Have you got a "gut-feel" for the number? The most popular guess seemed to be "quite few, less than a hundred" which seems ludicrously low to me, but I haven't really the foggiest.
With regards to the aforementioned CSW definitions, I'd say a lot more than 100. For example, the definition of MEANIE is "A person with a mean disposition", and ATONIC is "an unaccented syllable or word". The inclusion of words such as disposition and unaccented puts us easily into the 1000s. If you look at the Basic English Wikipedia, it's based on Basic English, a simplified flavour of English developed in the 1920s that has a vocabulary of around 850 words. This does not include technical terms, so it would be very difficult to write a dictionary with fewer words and still retain appropriate meaning in the definitions.
Last edited by Simon Myers on Thu Sep 03, 2009 4:41 pm, edited 1 time in total.
User avatar
Ray Folwell
Acolyte
Posts: 153
Joined: Tue Jan 22, 2008 5:46 pm

Re: How many words would you need to know...

Post by Ray Folwell »

Another point to consider is that you probably don't need to understand every word in in a definition to understand the meaning of a word.
For example : Chambers defines "elephant" as "a mammal (genus Elephas) of the order Proboscidea" and "Proboscidea" as "the elephant order of mammals" which is an immediate circular reference. But it then goes on to say other, more useful, things about elephants such as they have a thick skin, trunk and tusks.
Peter Mabey
Kiloposter
Posts: 1123
Joined: Sat Mar 01, 2008 3:15 pm
Location: Harlow

Re: How many words would you need to know...

Post by Peter Mabey »

There are even self-referential definitions, such as this from Chambers:

haoma /hōˈma or howˈma/ haoma haomas/hoˈma or howˈma/ Etymology: Avestan; see soma2noun a drink prepared from the haoma vine, used in Zoroastrian ritual(with cap) a deity, personification of haoma
David Williams
Kiloposter
Posts: 1269
Joined: Wed Jan 30, 2008 9:57 pm

Re: How many words would you need to know...

Post by David Williams »

Ray Folwell wrote:Another point to consider is that you probably don't need to understand every word in in a definition to understand the meaning of a word.
For example : Chambers defines "elephant" as "a mammal (genus Elephas) of the order Proboscidea" and "Proboscidea" as "the elephant order of mammals" which is an immediate circular reference. But it then goes on to say other, more useful, things about elephants such as they have a thick skin, trunk and tusks.
Doesn't this just mean that either "elephant" or "Proboscidea" is one of the words in the basic list you have to understand?
User avatar
Kieran Child
Enthusiast
Posts: 355
Joined: Fri Mar 20, 2009 8:48 pm

Re: How many words would you need to know...

Post by Kieran Child »

The people who say "I saw your face before. It was in the dictionary under 'stupid'"
Use their dictionary. It has pictures. You don't need to know any words.
User avatar
Michael Wallace
Racoonteur
Posts: 5458
Joined: Mon Jan 21, 2008 5:01 am
Location: London

Re: How many words would you need to know...

Post by Michael Wallace »

Kieran Child wrote:The people who say "I saw your face before. It was in the dictionary under 'stupid'
I don't think I've ever had this happen, is it a common occurrence?
User avatar
Brian Moore
Devotee
Posts: 582
Joined: Fri Feb 06, 2009 6:11 pm
Location: Exeter

Re: How many words would you need to know...

Post by Brian Moore »

Hmm, I wonder if the biggest sticking point will be the word 'understand'. Will the minimum definition of 'understand' be: giving just enough detail to differentiate the instance being defined from any other entry? But does that necessarily mean that you know what it is you're talking about?

Can you 'understand' what an elephant is from Ray's initial Chambers definition ("a mammal (genus Elephas) of the order Proboscidea") or do you need the supplementary bits? Or would "its a very big mammal wot lives in hot countries (here's a picture...)" lead to a better (or even minimum) 'understanding'?
User avatar
Kieran Child
Enthusiast
Posts: 355
Joined: Fri Mar 20, 2009 8:48 pm

Re: How many words would you need to know...

Post by Kieran Child »

They say it in America (I assume). It's been on two American 'comedies'. It was also referenced by Ed Byrne.
Simon Myers
Enthusiast
Posts: 295
Joined: Sat Dec 13, 2008 12:41 am
Location: Stamford, Connecticut

Re: How many words would you need to know...

Post by Simon Myers »

Kieran Child wrote:They say it in America (I assume). It's been on two American 'comedies'. It was also referenced by Ed Byrne.
American dictionaries tend to have illustrations, which would explain those jokes and the general bemusement of a British audience.
User avatar
Kieran Child
Enthusiast
Posts: 355
Joined: Fri Mar 20, 2009 8:48 pm

Re: How many words would you need to know...

Post by Kieran Child »

Americans are so cute.
User avatar
Michael Wallace
Racoonteur
Posts: 5458
Joined: Mon Jan 21, 2008 5:01 am
Location: London

Re: How many words would you need to know...

Post by Michael Wallace »

Tbh, pictures seem like quite a sensible way to go - for one thing I bet it would make remembering some definitions easier.
User avatar
Brian Moore
Devotee
Posts: 582
Joined: Fri Feb 06, 2009 6:11 pm
Location: Exeter

Re: How many words would you need to know...

Post by Brian Moore »

Michael Wallace wrote:Tbh, pictures seem like quite a sensible way to go - for one thing I bet it would make remembering some definitions easier.
Well, if Charlie works out we need, say, 10,000 words to understand everything in the dictionary, and a picture's worth a thousand words, then that means we'd only need ten pictures to be able to understand everything in the dictionary.

No, hold on, that can't be right....
User avatar
JimBentley
Fanatic
Posts: 2820
Joined: Fri Jan 11, 2008 6:39 pm
Contact:

Re: How many words would you need to know...

Post by JimBentley »

Charlie Reams wrote:Layman's overview: This is a really nice question.
This. My guess would be about 1,500 for a dictionary like the ODE, but it could well be a lot more as there's a lot of fairly obscure technical stuff in there.

Another question would be how large a dictionary could you compile using only an arbitrarily small pool of words, say 100 or 200, or 500?
User avatar
Jon Corby
Moral Hero
Posts: 8021
Joined: Mon Jan 21, 2008 8:36 am

Re: How many words would you need to know...

Post by Jon Corby »

Brian Moore wrote:Hmm, I wonder if the biggest sticking point will be the word 'understand'. Will the minimum definition of 'understand' be: giving just enough detail to differentiate the instance being defined from any other entry? But does that necessarily mean that you know what it is you're talking about?
Nah, there's no intelligence of any sort being used here. We're starting with an initial vocabulary of n words (that we understand). "Understanding" a definition from the dictionary then simply means that we already have all the words in the definition in our vocabulary. If this is the case, that word gets added to our vocabulary. And so on.
David Williams
Kiloposter
Posts: 1269
Joined: Wed Jan 30, 2008 9:57 pm

Re: How many words would you need to know...

Post by David Williams »

I don't have the maths to tackle this, but here's a way to start.

First, find any words that appear in definitions, but are not themselves defined. (I think FORELEG was one such.) All of these need to be in our list.

Next, find all the words that are defined but do not appear in any definition. I'm guessing this is a pretty high proportion, and none of them are in the required list. Remove them, and their definitions.

Repeat the process as necessary.

Pass what remains to someone who knows what they're talking about.
User avatar
Jon Corby
Moral Hero
Posts: 8021
Joined: Mon Jan 21, 2008 8:36 am

Re: How many words would you need to know...

Post by Jon Corby »

I agree, that is a good start point David. I think I remember seeing DOORLESS and HOMEPAGE as two words that are in definitions in the ODE2r but are not themselves defined. I don't think there's that many of them, but your second step must surely be very beneficial.
User avatar
Charlie Reams
Site Admin
Posts: 9494
Joined: Fri Jan 11, 2008 2:33 pm
Location: Cambridge
Contact:

Re: How many words would you need to know...

Post by Charlie Reams »

I think I figured out the correct way to do it last night. I might have a go at it, at some point.
User avatar
Jon Corby
Moral Hero
Posts: 8021
Joined: Mon Jan 21, 2008 8:36 am

Re: How many words would you need to know...

Post by Jon Corby »

Charlie Reams wrote:I think I figured out the correct way to do it last night. I might have a go at it, at some point.
Can you explain it in a way that I'd understand?
User avatar
Charlie Reams
Site Admin
Posts: 9494
Joined: Fri Jan 11, 2008 2:33 pm
Location: Cambridge
Contact:

Re: How many words would you need to know...

Post by Charlie Reams »

Jon Corby wrote:
Charlie Reams wrote:I think I figured out the correct way to do it last night. I might have a go at it, at some point.
Can you explain it in a way that I'd understand?
Maybe. You can think of it like a map, with the words as locations and a road from A to B if word A uses word B in its definition (they're one-way roads, so the direction is important). Then you look for cycles in the map, which correspond to sets of words which are somehow defined in terms of each other; I gave an example of a 2-word cycle above, but in general they could be any length. You need to know at least one word in every such cycle. The algorithm is then:

Repeat until the map is empty:
1) Output any locations with no outgoing roads, and delete them. Also delete (but don't output) any locations all of whose paths go through the locations you just deleted.
2) Find the location which features in the largest number of cycles; output and delete it, and again delete but don't output any locations all of whose paths go through the location you just deleted.

The list of words you outputted is the list of words you need to know.

I think this is correct but I haven't spent enough time to really convince myself yet.
David Williams
Kiloposter
Posts: 1269
Joined: Wed Jan 30, 2008 9:57 pm

Re: How many words would you need to know...

Post by David Williams »

There's something about closed loops as well. If A and B are both in the dictionary, and both appear only in the definition of the other, then one, and only one, of them has to be in the list. You can extend this to groups of three self-contained words, and so on. And as all the words left in the dictionary after my first exercise are also in other definitions, and there's a finite number of words, then all words are in a closed loop.

(I went to post this and found Charlie had put his method in. I think we may be saying much the same, but I'll post it anyway. If we are saying the same, seeing it two ways might make it easier to understand. If not, pointing out where I'm wrong might also help!)
User avatar
Charlie Reams
Site Admin
Posts: 9494
Joined: Fri Jan 11, 2008 2:33 pm
Location: Cambridge
Contact:

Re: How many words would you need to know...

Post by Charlie Reams »

David Williams wrote:all words are in a closed loop.
They're all in some closed loop, not necessarily the same one. Maybe this is what you meant.
David Williams
Kiloposter
Posts: 1269
Joined: Wed Jan 30, 2008 9:57 pm

Re: How many words would you need to know...

Post by David Williams »

Yes. But I'm less happy than I was about the significance once you get wider than two words that appear only in each other's entry.
User avatar
Jon Corby
Moral Hero
Posts: 8021
Joined: Mon Jan 21, 2008 8:36 am

Re: How many words would you need to know...

Post by Jon Corby »

Charlie Reams wrote:I might have a go at it, at some point.
Did you do this today?
Post Reply