6

I am capturing natural language user input and I need to check it against a predefined "correct" version. This much is trivial, but I am unsure about how to handle variations in contractions in the English language.

Suppose I'm expecting the sentence I'm positive you don't know what you're doing. The match needs to be exact, but I don't want to lock users into just one variation, as that would get frustrating fast.

So, should I manually enter every possible variation of that sentence as valid matches? Like so:

"I'm positive you don't know what you're doing."
"I am positive you don't know what you're doing."
"I am positive you do not know what you're doing."
"I am positive you do not know what you are doing."
"I'm positive you don't know what you are doing."
...

Etc, etc. Think of more complex sentences and you can see how maddening this gets.

Or, is there a programmatic way I could handle this? With Regex, JS, Ruby, or Rails (the tools I'm using)?

Any help appreciated, thanks.

San Diago
  • 1,030
  • 1
  • 12
  • 26
  • What about choosing key words and phrases - so in your example above, it would be **positive,do not know** and **doing** – user2182349 Apr 09 '17 at 01:28
  • 2
    Why you don't perform simple regex replacements before checking the sentence? Something like `\bdo not\b` => `don't`, `\bI am\b` => `I'm`, etc. – Casimir et Hippolyte Apr 09 '17 at 01:30
  • See [Javascript fuzzy search that makes sense](http://stackoverflow.com/questions/23305000/javascript-fuzzy-search-that-makes-sense) – guest271314 Apr 09 '17 at 01:34
  • @CasimiretHippolyte Good idea, I hadn't thought of that. It might work, thanks. – San Diago Apr 09 '17 at 01:51
  • @user2182349 Unfortunately it's important to check every word. – San Diago Apr 09 '17 at 01:52
  • @guest271314 I think that's a bit overkill, but if nothing pans out I'm gonna give it a look. Thanks for bringing up something I hadn't thought about. – San Diago Apr 09 '17 at 01:53

1 Answers1

6

There can't be that many English contractions. I would store each variation as a key that points to the same value, like (pseudo Ruby-esque but of course could be done with JS)

"aren't"  => :arent
"are not" => :arent 
etc.

Then store the correct sentence using the shared values.

":im positive you :dont know what :youre doing"

When you receive an input, replace matched keys with their stored value, then check the converted sentence against the correct one, stored with the specially marked contractions.

(Note: for the few cases you might like to respond individually to different phrases with identical contractions, make special provisions.)

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • I like this approach very much, it's very clever. I'm going to give it some time to maybe get more ideas because I'm off to bed now, but I'm probably gonna accept your answer. Thanks.! – San Diago Apr 09 '17 at 01:56
  • 1
    @SanDiago thank you for your comment. We can all learn more from different ideas and answers. Nice question. – גלעד ברקן Apr 09 '17 at 01:59
  • 1
    The clitic `'s` can be appended to virtually any English noun as a contraction of "is" or "has". "That dog's got beautiful eyes." is an example of the second. Also, compare "John's not here." with "John isn't here." So it's not quite accurate to say that contractions can be easily enumerated, nor that they are unambiguous. – rici Apr 09 '17 at 06:00
  • @rici the contractions you allude to are part of spoken rather than written English (see this article, where it says, "Contractions can occur after nouns, names, here, there and now and question words. These contractions are not considered appropriate in formal writing." (http://dictionary.cambridge.org/us/grammar/british-grammar/writing/contractions) Indeed, they could possibly make this task slightly more interesting and challenging, and depend on the OP's specification for a "correct sentence." – גלעד ברקן Apr 09 '17 at 10:26
  • 1
    @rici but since you brought them up...It wouldn't seem far fetched to have a shared matching for `[name] + 's not` and `[name] + isn't`, and other similar examples. Since It's likely the OP has hard-coded specific sentences rather than attempted an English grammar AI, these could probably be attended to in the same vein as the others. – גלעד ברקן Apr 09 '17 at 10:47
  • That's a good point `rici` brings up, but `גלעד ברקן`'s method still applies. I don't mind creating keys specific to a sentence-- it can't be avoided--, it's the manual permutation approach that was giving me the creeps. This is a genius solution to it. I've accepted the answer, thanks again! – San Diago Apr 09 '17 at 14:17
  • Sadly, you might have to match `your` to both `:youre` and `:your`. – Eric Duminil Apr 09 '17 at 16:35
  • @EricDuminil not sure I follow - how is "your" a contraction? – גלעד ברקן Apr 09 '17 at 19:07
  • It's not a contraction but it's used by many people interchangeably and mistakenly with `you're`. – Eric Duminil Apr 09 '17 at 19:11
  • @EricDuminil if "your" isn't a contraction or the expansion of one, we wouldn't have to worry about hashing it to a corresponding shared value. – גלעד ברקן Apr 09 '17 at 20:15