-1

What I want to do is give out keywords (like Google auto suggest) by training the machine to learn from historical search(basically make the machine to remember which words usually come together when searching).

Could anyone tell me the basic idea of algorithm (you can point out the direction I need to research on) and how to start (some simple algorithms which could work but not effective)?

Kuan
  • 11,149
  • 23
  • 93
  • 201

1 Answers1

0

There are lots of string matching algorithms. For your purpose, I would recommend the Levenshtein algorithm. It is the algorithm that is used in Microsoft Word for auto correction. The algorithm returns the amount of mutations a string needs to process before it can reach another string.

Example code (source)

using System;

/// <summary>
/// Contains approximate string matching
/// </summary>
static class LevenshteinDistance
{
    /// <summary>
    /// Compute the distance between two strings.
    /// </summary>
    public static int Compute(string s, string t)
    {
    int n = s.Length;
    int m = t.Length;
    int[,] d = new int[n + 1, m + 1];

    // Step 1
    if (n == 0)
    {
        return m;
    }

    if (m == 0)
    {
        return n;
    }

    // Step 2
    for (int i = 0; i <= n; d[i, 0] = i++)
    {
    }

    for (int j = 0; j <= m; d[0, j] = j++)
    {
    }

    // Step 3
    for (int i = 1; i <= n; i++)
    {
        //Step 4
        for (int j = 1; j <= m; j++)
        {
        // Step 5
        int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;

        // Step 6
        d[i, j] = Math.Min(
            Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
            d[i - 1, j - 1] + cost);
        }
    }
    // Step 7
    return d[n, m];
    }
}
bytecode77
  • 14,163
  • 30
  • 110
  • 141
  • I realize this is a huge and difficult area, could you give me a general idea what Levenshtein algorithm trying to do before I make my determine to dig into it? Thanks – Kuan Apr 23 '15 at 00:00
  • Basically what I want to do is to remember keyword pair or group. I do not need very precise about the suggested keyword. Something I want to machine to remember is which keywords usually appear in a search together, just something like that. – Kuan Apr 23 '15 at 00:05
  • The Levenshtein algorithm returns the amount of changes a string has to go trough to become another. Example "aunt" -> "ant" returns 1, because it needs one character to be changed in order to become the sedond string. You need to compare the search string to the existing keywords. – bytecode77 Apr 23 '15 at 00:08
  • @ bytecode77 Thanks for explaning. Well, actually I am not looking for keyword similarity algorithm, I am looking for something can group them synmaticly, as I said above, I just want to find search keyword pairs or group which usually search in a query, just like, when search keyword A, usually will with Keyword B, or A and B usually used together to search – Kuan Apr 23 '15 at 00:17