1

I'm sort of building an AI for a Telegram Bot, and currently I'm trying to process the text and respond to the user almost like a human does.

For example;

"I want to register"

As a human we understand that the user wants to register.

So I'd process this text using javascript's indexOf to look for want and register

var user_text = message.text;
if (user_text.indexOf('want') >= 0) {
    if (user_text.indexOf('register') >= 0) {
        console.log('He wants to register?')
    }
}

But what if the text contains not somewhere in the string? Of course I'd have like a zillion of conditions for a zillion of cases. It'd be tiring to write this kind of logic.

My question is — Is there any other elegant way to do this? I don't really know the keyword to Google this...

Charles Okwuagwu
  • 10,538
  • 16
  • 87
  • 157
rolodex
  • 558
  • 1
  • 7
  • 19

2 Answers2

2

The concept you're looking for is natural language processing and is a very broad field. Full NLP is very intricate and complicated, with all kinds of issues.

I would suggest starting with a much simpler solution, by splitting your input into words. You can do that using the String.prototype.split method with some tweaks. Filter out tokens you don't care about and don't contribute to the command, like "the", "a", "an". Take the remaining tokens, look for negation ("not", "don't") and keywords. You may need to combine adjacent tokens, if you have some two-word commands.

That could look something like:

var user_text = message.text;
var tokens = user_text.split(' '); // split on spaces, very simple "word boundary"
tokens = tokens.map(function (token) {
  return token.toLowerCase();
});

var remove = ['the', 'a', 'an'];
tokens = tokens.filter(function (token) {
  return remove.indexOf(token) === -1; // if remove array does *not* contain token
});

if (tokens.indexOf('register') !== -1) {
  // User wants to register
} else if (tokens.indexOf('enable') !== -1) {
  if (tokens.indexOf('not') !== -1) {
    // User does not want to enable
  } else {
    // User does want to enable
  }
}

This is not a full solution: you will eventually want to run the string through a real tokenizer and potentially even a full parser, and may want to employ a rule engine to simplify the logic.

If you can restrict the inputs you need to understand (a limited number of sentence forms and nouns/verbs), you can probably just use a simple parser with a few rules to handle most commands. Enforcing a predictable sentence structure with articles removed will make your life much easier.

You could also take the example above and replace the filter with a whitelist (only include words that are known). That would leave you with a small set of known tokens, but introduces the potential to strip useful words and misinterpret the command, so you should confirm with the user before running anything.

Community
  • 1
  • 1
ssube
  • 47,010
  • 7
  • 103
  • 140
  • I'm also interested in the way Siri, Cortana and Google confirm and understand our commands. Just like you said, I will confirm with the user on the command, and log the text to have my program learn the instruction the next time. I'll start small. Thanks again for the insight. Valuable! – rolodex Aug 14 '15 at 16:42
1

If you really want to parse and understand sentences expressed in natural language, you should look into the topic of natural language processing. This is usually done with some kind of neural network trained to "understand" different variations of sentences (aka machine learning), because specifying all of different syntactic and semantic rules of the language appears to be an overwhelming task.

If however the amount of variations of these sentences is limited, then you could specify some rules in the form of commonly used word combinations, probably even regular expressions would do in the simplest case.

Forketyfork
  • 7,416
  • 1
  • 26
  • 33