I already did something similar. You cannot use a simple dictionary. The result will be messy. It depends if you only have to do this once or as whole program.
My solution was to:
- Connect to a database with working
words from a dictionary list (for
example online dictionary)
- Filter long and short words in dictionary and check if you want to trim stuff (for example don't use words with only one character like 'I')
- Start with short words and compare your bigString with the database dictionary.
Now you need to create a "table of possibility". Because a lot of words can fit into 100% but are wrong. As longer the word as more sure you are, that this word is the right one.
It is cpu intensive but it can work precise in the result.
So lets say, you are using a small dictionary of 10,000 words and 3,000 of them are with a length of 8 characters, you need to compare your bigString at start with all 3,000 words and only if result was found, it is allowed to proceed to the next word. If you have 200 characters in your bigString you need about (2000chars / 8 average chars) = 250 full loops minimum with comparation.
For me, I also did a small verification of misspelled words into the comparation.
example of procedure (don't copy paste)
Dim bigString As String = "helloworld.thisisastackoverflowtest!"
Dim dictionary As New List(Of String) 'contains the original words. lets make it case insentitive
dictionary.Add("Hello")
dictionary.Add("World")
dictionary.Add("this")
dictionary.Add("is")
dictionary.Add("a")
dictionary.Add("stack")
dictionary.Add("over")
dictionary.Add("flow")
dictionary.Add("stackoverflow")
dictionary.Add("test")
dictionary.Add("!")
For Each word As String In dictionary
If word.Length < 1 Then dictionary.Remove(word) 'remove short words (will not work with for each in real)
word = word.ToLower 'make it case insentitive
Next
Dim ResultComparer As New Dictionary(Of String, Double) 'String is the dictionary word. Double is a value as percent for a own function to weight result
Dim i As Integer = 0 'start at the beginning
Dim Found As Boolean = False
Do
For Each word In dictionary
If bigString.IndexOf(word, i) > 0 Then
ResultComparer.Add(word, MyWeightOfWord) 'add the word if found, long words are better and will increase the weight value
Found = True
End If
Next
If Found = True Then
i += ResultComparer(BestWordWithBestWeight).Length
Else
i += 1
End If
Loop