3

I am guessing the key of a less-simple simple substitution ciphertext. The rule that I evaluate the correctness of the key is number of english words in the putative decryption.

Are there any tools in java that can check the number of english words in a string. For example, "thefoitedstateswasat"-> 4 words "thefortedxyzstateswasathat"->5 words.

I loaded words list and using HashSet as a dictionay. As I dont know the inter-word spaces belong in the text, I can't validate words using simple dictionary.

Thanks.

SecureFish
  • 2,391
  • 7
  • 32
  • 43
  • The number of valid English words isn't a really good indicator: "tintint" contains more valid words than "rhythms", for instance. – hauntsaninja Feb 10 '11 at 08:53

3 Answers3

1

I gave an answer to a similar question here:

If a word is made up of two valid words

It has some Java-esque pseudocode in it that might be adaptable into something that solves this problem.

Community
  • 1
  • 1
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
1

Sorry I'm new and does not have the rep to comment yet.

But wouldn't the code be very slow as the number of checks and permutations is very big?

I guess you just have to brute force your way through by using (n-1) words nested for loop. And then search the dictionary for each substring.

user607455
  • 479
  • 5
  • 18
  • 1
    @user607455- Yes, but if you think about it, you're asking how many ways there are to split a string into subwords. This is, in the worst case, equivalent to asking how many ways you can partition a string into substrings, and there are exponentially many ways to do this. The actual runtime isn't that bad, though, because the recursion stops quickly when it finds that what's currently being considered isn't a word. – templatetypedef Feb 10 '11 at 06:34
  • @templatetypedef The recursion isn't meant stop when it encounters a non-word: it doesn't matter if there's garbage in the middle, as long as there is a greater number of valid English words. – hauntsaninja Feb 10 '11 at 08:50
  • @hauntsaninja- Sorry, I meant the *current recursive call* stops whenever the *current word fragment* isn't valid. You're absolutely correct - you don't want to stop if you find a bad possibility! – templatetypedef Feb 10 '11 at 09:13
0

Surely there's a better way to test the accuracy of your key?

But that's not the point, here's what I'd do:

Using "quackdogsomethinggodknowswhat"

I'd have a recursive method where starting at the beginning of the string, I'd call a recursive method for all the words with which the subject string starts, in this case "qua", and "quack" with the string not containing the word ("dogsomethinggodknowswhat" for quack). Return whatever is greater: 1 + the greatest value returned out of all your method calls OR 0 + the method call for the string starting at index 1 ("uackdogsomethinggodknowswhat").

This would probably work best if you kept your wordlist in a tree of some sort.

If you need some pseudocode, ask!

hauntsaninja
  • 859
  • 7
  • 11