Page 1 of 1

Word search algorithms

Posted: Fri Aug 09, 2024 4:38 pm
by Fiona T
Thought this would benefit from a separate thread...
Graeme Cole wrote: Wed Aug 07, 2024 12:01 pm
Fiona T wrote: Wed Aug 07, 2024 7:21 am
Graeme Cole wrote: Wed Aug 07, 2024 12:25 am You might want to exclude the scramble itself from the possible solutions. It's valid for the scramble to be a valid word if there's only one anagram of it, but if you feed the conundrum checker "RAUNCHIER HURRICANE", for example, it complains that there are two solutions.
yeah I considered that but figured you could think "yes I know" :) But yeah easy enough to exclude it. (Edit - done)
Graeme Cole wrote: Wed Aug 07, 2024 12:25 am
That sounds optimisable. What does it do to check a conundrum?
It's probably a bit quicker than that but still slower than I'd like.

It's basically doing the same as I've done with RoboRiley - I've got a trie structure with the dictionary in, I then get all the permutations (362880 for 9 letters without repeats) stick those in a trie and compare tries. Tries are pretty efficient - I think it's the permutations thing that needs optimising - will have a google...
Yep, generating all the permutations of a 9 letter word is the red flag here.

If you've already got a list of all the valid words, you could arrange the letters within each valid word in alphabetical order, then build a sorted list of the resulting strings (or put them in a trie, if you prefer). For example, ACDEINOTU would appear in the list three times, consecutively. Now you only have to check one permutation of the scramble, look it up in your pre-computed sorted list, and count how many times it appears.

There are ways of improving on this (e.g. include the count in the list like ["ACDEINOTU", 3] rather than repeating the strings), but the main thing is you don't have to generate every permutation of the scramble.
Fiona T wrote: Thu Aug 08, 2024 9:46 am
Graeme Cole wrote: Wed Aug 07, 2024 7:17 pm What if you just iterate over the whole dictionary, checking every valid word to see if it's an anagram of the scramble? (Or, for the general case, that the word is available from the scramble?) I think even that would still be faster than checking every permutation of the scramble to see if it's a valid word. Checking if one string is an anagram of another is basically the same algorithmic complexity as finding a string in a trie, but there are four times as many permutations of 9 letters as there are 1-9 letter words in the dictionary.
Yeah but I'm not checking 9! words - by putting the permutations into a trie, if the scramble is ABCXEANRI once CX is ruled out, then all the words that start with CX are ruled out etc... Anyway - probably sidetracking this thread from its original purpose - will have a play and do some comparisons.


Anyway - for the conundrum checker I did as Graeme suggested and created (from my dictionary trie) an alphagram dictionary with a count, and that worked beautifully.

Was reasonably happy with the RoboRiley word finder on 9 letter words, but as soon as you went to 10 there was a delay and 12 letters became very noticeable.

I was putting the permutations of the selection into a trie and comparing to the dictionary trie - the compare bit was super quick.
It occurred to me I didn't need to put the permutations into a trie - just adapt that code to directly search the dictionary trie and stop then when you get to a non-existent prefix so saving 1000s of iterations (e.g. when it hits CX it doesn't need to create any further iterations of that prefix). It works instantly now, and no noticeable lag with longer selections, so lots of future possibilities... :) Hyper event anyone...? ;)

Re: Word search algorithms

Posted: Fri Aug 09, 2024 9:53 pm
by JackHurst
The app uses a SQL DB with a large number of columns including aCount, bCount ... zCount

Getting every single available word from a selection is the. A single DB query. You just build a query to fetch all words where aCount of the DB row is LTE the number of As in the selection.

I think when I measured it, it solved a round in about 250ms. Fast enough for the app, but still relatively rubbish if you want a fast solver for doing analysis. Probably can be optimized for speed, but its reliable!

For a fast bulk solver, I also used tries. The objective here is slightly different, you want to just get maxes for many rounds as quickly as possible. Running it on my M2 MacBook Pro it solved 10 million rounds in I think around 45 mins. Which I think is around 0.27ms per round! I used chatPGT to write the code.

Re: Word search algorithms

Posted: Sat Aug 10, 2024 6:10 am
by Fiona T
Yeah I haven't run any volume tests, but picking a 9 letter selection without duplicates - RELATIONS, a 9 with duplicates - TESTICLES, and some 10s without duplicates, I'm getting these times which are certainly good enough for the purposes I'm using for at present - this is to return the full list of words that can be made from the selection.

Thinking about wildcards, I think it would be pretty simple to adapt the algorithm slightly to keep track of whether the wildcard has been used and return all the nodes at that iteration if not. Might have a play later ...

Code: Select all

06:50:32:903	Selection: RELATIONS
06:50:32:903	Elapsed ms: 0.6763 
06:50:37:908	Selection: TESTICLES
06:50:37:908	Elapsed ms: 0.1317 
06:50:42:451	Selection: RELATIONS
06:50:42:451	Elapsed ms: 0.6183
06:50:46:967	Selection: TESTICLES
06:50:46:967	Elapsed ms: 0.1672 
06:54:31:241	Selection: RELATIONSH
06:54:31:241	Elapsed ms: 1.9628 
06:54:36:992	Selection: RELATIONSF
06:54:36:992	Elapsed ms: 1.1929 
I haven't used a database - for RR I wanted to keep the distribution as simple as possible, so have serialised the trie so the file that's distributed looks a bit like this... (Just for the record, Maus and Phil both have the instructions for creating the file from a word list in the event that I walk under a bus!)

Code: Select all

A.A.PA.S.)))RDVARK.S.)))))WOLF.)VES.)))))))GH.)))SVOEL.S.)))GEL.S.))))))))BACA.S.))K.)US.ES.))))) ... etc

Re: Word search algorithms

Posted: Sat Aug 10, 2024 1:14 pm
by Fiona T
Fiona T wrote: Sat Aug 10, 2024 6:10 am Yeah I haven't run any volume tests, but picking a 9 letter selection without duplicates - RELATIONS, a 9 with duplicates - TESTICLES, and some 10s without duplicates, I'm getting these times which are certainly good enough for the purposes I'm using for at present - this is to return the full list of words that can be made from the selection.

Thinking about wildcards, I think it would be pretty simple to adapt the algorithm slightly to keep track of whether the wildcard has been used and return all the nodes at that iteration if not. Might have a play later ...

Code: Select all

06:50:32:903	Selection: RELATIONS
06:50:32:903	Elapsed ms: 0.6763 
06:50:37:908	Selection: TESTICLES
06:50:37:908	Elapsed ms: 0.1317 
06:50:42:451	Selection: RELATIONS
06:50:42:451	Elapsed ms: 0.6183
06:50:46:967	Selection: TESTICLES
06:50:46:967	Elapsed ms: 0.1672 
06:54:31:241	Selection: RELATIONSH
06:54:31:241	Elapsed ms: 1.9628 
06:54:36:992	Selection: RELATIONSF
06:54:36:992	Elapsed ms: 1.1929 
I haven't used a database - for RR I wanted to keep the distribution as simple as possible, so have serialised the trie so the file that's distributed looks a bit like this... (Just for the record, Maus and Phil both have the instructions for creating the file from a word list in the event that I walk under a bus!)

Code: Select all

A.A.PA.S.)))RDVARK.S.)))))WOLF.)VES.)))))))GH.)))SVOEL.S.)))GEL.S.))))))))BACA.S.))K.)US.ES.))))) ... etc
Simpler than I thought. Just treat the wildcard ? like a letter and return all the available nodes when the letter == '?'

Still nice and quick for human use (although measurably slower) - you can try it here

https://focaltools.azurewebsites.net/Fo ... ecker.aspx

will process up to two wildcards (??) - any more will be ignored. Result display restricted to 500 words cos crazy long list was causing browser issues!

Code: Select all

14:07:08:053	Selection: RELATIONS
14:07:08:053	Elapsed ms: 1.6884
14:07:27:080	Selection: RELATIONS?
14:07:27:080	Elapsed ms: 45.4602
14:07:58:999	Selection: RELATIONS??
14:07:59:250	Elapsed ms: 227.1725
14:08:19:911	Selection: REFLATIONS??
14:08:20:160	Elapsed ms: 268.2098
Haven't got a practical use for it yet tho! Hypergoat co-event...? :D

Re: Word search algorithms

Posted: Sat Aug 10, 2024 2:29 pm
by Graeme Cole
JackHurst wrote: Fri Aug 09, 2024 9:53 pm The app uses a SQL DB with a large number of columns including aCount, bCount ... zCount

Getting every single available word from a selection is the. A single DB query. You just build a query to fetch all words where aCount of the DB row is LTE the number of As in the selection.

I think when I measured it, it solved a round in about 250ms. Fast enough for the app, but still relatively rubbish if you want a fast solver for doing analysis. Probably can be optimized for speed, but its reliable!

For a fast bulk solver, I also used tries. The objective here is slightly different, you want to just get maxes for many rounds as quickly as possible. Running it on my M2 MacBook Pro it solved 10 million rounds in I think around 45 mins. Which I think is around 0.27ms per round! I used chatPGT to write the code.
You can extend the trie solution to find all words from a selection, not just maxes.

Suppose you have a pre-computed trie with all the words in the dictionary in it. Nodes representing a complete word are marked as such. Given a selection, you want to find all the valid words available from that selection.

For example, suppose your selection is PTTAEEMLT. This can equivalently be expressed as a frequency count for each letter:

Code: Select all

{
    "A" : 1,
    "E" : 2,
    "L" : 1,
    "M" : 1,
    "P" : 1,
    "T" : 3
}
All the other letters have an implied frequency of 0.

The idea is to walk the entire trie, skipping the branches you don't need. Descending down a branch "uses" a letter from your selection, and ascending back up "replaces" it. Ignore branches corresponding to letters you don't have in your selection, and keep track of the string of letters that got you from the root to where you currently are.

More explicitly:
  • Your selection frequency map starts as above, and the current word is the empty string ("").
  • Walk the trie in order (visit a node, then visit its children recursively), starting with the root, subject to the following rules.
  • Don't visit a node's child if that child's letter has a frequency of 0 in your map.
  • When you do descend to a node's child, subtract 1 from that letter's frequency, and add the letter to the current word.
  • When you've finished visiting a node and its children, and you ascend back up to its parent, remove the last letter from the current word and add 1 back on to that letter's frequency.
  • When you visit a node representing a complete word, emit the current word.
This emits all words available from the selection.

You could even do this directly on Fiona's serialised representation without having to build the trie from it in memory. If I'm not mistaken, that serialised representation is an in-order traversal of the entire trie: A-Z means "create a child node from the current node, with this letter, and descend into it", a dot means "this node is a complete word" and a closing bracket means "we've finished the current node, ascend back up to its parent".

Therefore, traversing the whole trie directly from that representation is easy - you just need to keep track of the letters between the root and where you currently are. However, you can't easily skip a large subtree because you still have to step through the parts you don't need, keeping track of how deep you are in the tree. Still, it might be advantageous to do it this way if you only wanted to check the maxes for one selection and didn't want to spend time building the structured trie in memory.

Re: Word search algorithms

Posted: Sat Aug 10, 2024 2:55 pm
by Fiona T
My implementation stores the word against the appropriate node (represented by a . on my serialisation) which avoids the need to keep track of where you are.

After a lot of messing, my code currently looks like this

Code: Select all

        static void GenWords(Trie.Node dictionaryNode, string selection, List<string> wordsFound, bool firstPass = true)
        {
            if (firstPass)
            {
                // Put into alphagram on first pass
                selection = String.Concat(selection.ToString().ToUpper().OrderBy(c => c));
            }
            for (int i = 0; i < selection.Length; i++)
            {
                if (i==0 || (i > 0 && selection[i] != selection[i - 1]))  // No need to repeat the search for repeated letters
                {
                    var letter = selection[i];
                    {
                        foreach (var edge in dictionaryNode.Edges.Where(e=> e.key == letter || letter=='?'))
                        {
                            if (letter == '?' || edge.key == letter)  // ? = wildcard - add all letters
                            {
                                // If it's a word, add to the list
                                if (edge.IsWord)
                                {
                                    wordsFound.Add(edge.Word);
                                }

                                // Remove this letter from the selection and repeat...
                                int index = selection.IndexOf(letter);
                                string newselection = (index < 0)
                                    ? selection
                                : selection.Remove(index, 1);

                                if (!String.IsNullOrEmpty(selection))
                                {
                                    GenWords(edge, newselection, wordsFound, false);
                                }
                            }
                        }
                    }
                }
            }
The main difference adding the wildcard has made is not doing dictionaryNode.Find on the key but it doesn't seem to have made enough of a performance impact to make it worth duplicating the code for ? or letter.

Re: Word search algorithms

Posted: Sun Aug 11, 2024 4:01 pm
by Callum Todd
Fiona T wrote: Fri Aug 09, 2024 4:38 pm Hyper event anyone...? ;)
I'm considering this for 2026, or late 2025 if there's a gap in the calendar.

Love this Countdowner's programming corner thread :-)

Re: Word search algorithms

Posted: Sun Aug 11, 2024 10:32 pm
by Andres Sanchez
I don't know if anybody else here is skilled with Google Sheets but I'd be interested in making a solver in Google Sheets. I have a mockup version of Countdown that I haven't run in a while, and I think I'd want to try improving on it a bit.

EDIT: I do kinda realize IMPORTHTML formulae could work but could somebody input a string into a link like that?

Re: Word search algorithms

Posted: Wed Aug 14, 2024 7:59 am
by Fiona T
Andres Sanchez wrote: Sun Aug 11, 2024 10:32 pm I don't know if anybody else here is skilled with Google Sheets but I'd be interested in making a solver in Google Sheets. I have a mockup version of Countdown that I haven't run in a while, and I think I'd want to try improving on it a bit.

EDIT: I do kinda realize IMPORTHTML formulae could work but could somebody input a string into a link like that?
Don't know much about google sheets, but I have created an API that returns maxes from a selection - it's intended for use one word at a time, not for smashing through 1000s of scrambles (my daily free usage limit would quickly get exhausted!) Can PM you details if you want to take a look.

Re: Word search algorithms

Posted: Thu Aug 15, 2024 1:34 am
by Andres Sanchez
Fiona T wrote: Wed Aug 14, 2024 7:59 am
Andres Sanchez wrote: Sun Aug 11, 2024 10:32 pm I don't know if anybody else here is skilled with Google Sheets but I'd be interested in making a solver in Google Sheets. I have a mockup version of Countdown that I haven't run in a while, and I think I'd want to try improving on it a bit.

EDIT: I do kinda realize IMPORTHTML formulae could work but could somebody input a string into a link like that?
Don't know much about google sheets, but I have created an API that returns maxes from a selection - it's intended for use one word at a time, not for smashing through 1000s of scrambles (my daily free usage limit would quickly get exhausted!) Can PM you details if you want to take a look.
Would be very much appreciated.

Re: Word search algorithms

Posted: Thu Aug 15, 2024 6:57 pm
by JackHurst
Don't use Google sheets for this. That's like using a gun to chop down a tree.

Sure you can do it if you put a lot of time, bullets and effort in. But wouldn't you have been better off going to the store to buy a chainsaw and using that?

Learn some beginner python and get familiar with Chat gpt. Having countdown as a motivator to make a project in a beginner friendly language like Python is going to teach you some great skills to have for your future career.

Re: Word search algorithms

Posted: Thu Aug 15, 2024 8:11 pm
by Andres Sanchez
JackHurst wrote: Thu Aug 15, 2024 6:57 pm Don't use Google sheets for this. That's like using a gun to chop down a tree.

Sure you can do it if you put a lot of time, bullets and effort in. But wouldn't you have been better off going to the store to buy a chainsaw and using that?

Learn some beginner python and get familiar with Chat gpt. Having countdown as a motivator to make a project in a beginner friendly language like Python is going to teach you some great skills to have for your future career.
Is C# fine? I have a lot of friends that use Unity for their game show constructions so I feel I'd like to follow that train.

Re: Word search algorithms

Posted: Thu Aug 15, 2024 8:42 pm
by JackHurst
Andres Sanchez wrote: Thu Aug 15, 2024 8:11 pm
JackHurst wrote: Thu Aug 15, 2024 6:57 pm Don't use Google sheets for this. That's like using a gun to chop down a tree.

Sure you can do it if you put a lot of time, bullets and effort in. But wouldn't you have been better off going to the store to buy a chainsaw and using that?

Learn some beginner python and get familiar with Chat gpt. Having countdown as a motivator to make a project in a beginner friendly language like Python is going to teach you some great skills to have for your future career.
Is C# fine? I have a lot of friends that use Unity for their game show constructions so I feel I'd like to follow that train.
Oh. Is it important for you to have a shiny nice user interface? Like trying to make a replica of the TV show?

Re: Word search algorithms

Posted: Thu Aug 15, 2024 10:07 pm
by Fiona T
JackHurst wrote: Thu Aug 15, 2024 8:42 pm
Andres Sanchez wrote: Thu Aug 15, 2024 8:11 pm
JackHurst wrote: Thu Aug 15, 2024 6:57 pm Don't use Google sheets for this. That's like using a gun to chop down a tree.

Sure you can do it if you put a lot of time, bullets and effort in. But wouldn't you have been better off going to the store to buy a chainsaw and using that?

Learn some beginner python and get familiar with Chat gpt. Having countdown as a motivator to make a project in a beginner friendly language like Python is going to teach you some great skills to have for your future career.
Is C# fine? I have a lot of friends that use Unity for their game show constructions so I feel I'd like to follow that train.
Oh. Is it important for you to have a shiny nice user interface? Like trying to make a replica of the TV show?
That's kinda what RoboRiley is trying to do. In hindsight I probably wouldn't use winforms for the front end as some of the graphical elements are a PITA (particularly the lack of scalability and the absolute shite way it does transparency), but it's basically winforms and c# (too unmotivated to learn new/different technologies which is why I packed in doing it professionally!) RR started as something for me to host Reading with, then kinda grew as I realised it might be useful for others.

But yep, agree with Jack, you need something you care about as a motivator to get stuff working. I've had more fun playing with RR and this stuff after an 18 month break from coding than I remember having for a long time doing it as a career. Less well paid though ;)

Re: Word search algorithms

Posted: Thu Aug 15, 2024 11:21 pm
by Andres Sanchez
Oh I'm not doubting what Jack's saying at all. I agree that there needs to be a motivation, and that's why my friends have made said game show constructions; some even being original.

I'm just plainly asking if C# works fine with the reasoning being that most of the people I know are using it so why shouldn't I? Plus I've never really used Unity at all so I think I'd like to try it out.