4

I'm building a chatbot in C# using AIML files, at the moment I've this code to process:

<aiml>
    <category>
        <pattern>a * is a *</pattern>
        <template>when a <star index="1"/> is not a <star index="2"/>?</template>
    </category>
</aiml>

I would like to do something like:

if (user_string == pattern_string) return template_string;

but I don't know how to tell the computer that the star character can be anything, and expecially that can be more than one word! I was thinking to do it with regular expressions, but I don't have enough experience with it. Can somebody help me? :)

Dave Zych
  • 21,581
  • 7
  • 51
  • 66

2 Answers2

2

Using Regex

static bool TryParse(string pattern, string text, out string[] wildcardValues)
{
    // ^ and $ means that whole string must be matched
    // Regex.Escape (http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.escape(v=vs.110).aspx)
    // (.+) means capture at least one character and place it in match.Groups
    var regexPattern = string.Format("^{0}$", Regex.Escape(pattern).Replace(@"\*", "(.+)"));

    var match = Regex.Match(text, regexPattern, RegexOptions.Singleline);
    if (!match.Success)
    {
        wildcardValues = null;
        return false;
    }

    //skip the first one since it is the whole text
    wildcardValues = match.Groups.Cast<Group>().Skip(1).Select(i => i.Value).ToArray();
    return true;
}

Sample usage

string[] wildcardValues;
if(TryParse("Hello *. * * to *", "Hello World. Happy holidays to all", out wildcardValues))
{
    //it's a match
    //wildcardValues contains the values of the wildcard which is
    //['World','Happy','holidays','all'] in this sample
}

By the way, you don't really need Regex for this, it's overkill. Just implement your own algorithm by splitting the pattern into tokens using string.Split then finding each token using string.IndexOf. Although using Regex does result in shorter code

LostInComputer
  • 15,188
  • 4
  • 41
  • 49
  • 1
    RegEx may be overkill, but unless there is huge performance issue, I don't see any reason for implementing a custom algorithm. – Johnny5 Dec 23 '13 at 20:29
0

Do you think this should work for you?

Match match = Regex.Match(pattern_string, @"<pattern>a [^<]+ is a [^<]+</pattern>");
if (match.Success)
{
    // do something...
}

Here [^<]+ represents for one or more characters which is/are not <

If you think you may have < character in your *, then you can simply use .+ instead of [^<]+
But this will be risky as .+ means any characters having one or multiple times.

Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85