1

I have two Lists, they look like this

<List> ads
[0]
Headline = "Sony Ericsson Arc silver"
[1]
Headline = "Sony Ericsson Play R800I"


<List> feedItems
[0]
Headline = "Sony Ericsson Xperia Arc Silver"
[1]
Headline = "Sony Ericsson Xperia Play R800i Black"

What is the easiest way of creating a new, third list, with the elements that match each other with at least two words? Could you accomplish this in a LINQ-way maybe?

The third list would look like this

[0]
AdHeadline = "Sony Ericsson Arc silver"
MatchingFeed  = "Sony Ericsson Xperia Arc Silver"
// etc

I've tried traversing the first list and used the Regex.Match class, and if I find a match I populate the third list - I'm wondering what your preferred way of doing this would be, and also how to check for min. 2+ words in the expression.

subZero
  • 5,056
  • 6
  • 31
  • 51
  • possible duplicate of [How can I measure the similarity between 2 strings?](http://stackoverflow.com/questions/1034622/how-can-i-measure-the-similarity-between-2-strings) – bummi Aug 12 '13 at 21:50
  • You should also take into consideration the possibility of spelling mistakes and the use of abbreviations. There are whole programming packages dedicated to this sort of thing. And I agree with Rawling that regex is not of any use for this kind of problem. – RenniePet Aug 13 '13 at 01:03

3 Answers3

5

I'm not sure regular expressions bring anything to the party here. How about the following?

// Define a helper function to split a string into its words.
Func<string, HashSet<string>> GetWords = s =>
    new HashSet<string>(
        s.Split(new[]{' '}, StringSplitOptions.RemoveEmptyEntries)
        );

// Pair up each string with its words. Materialize the second one as
// we'll be querying it multiple times.
var aPairs = ads.Select(a => new { Full = a, Words = GetWords(a) });
var fPairs = feedItems
                 .Select(f => new { Full = f, Words = GetWords(f) })
                 .ToArray();

// For each ad, select all the feeds that match more than one word.
// Then just select the original ad and feed strings.
var result = aPairs.SelectMany(
    a => fPairs
        .Where(f => a.Words.Intersect(f.Words).Skip(1).Any())
        .Select(f => new { AdHeadline = a.Full, MatchingFeed = f.Full })
    );
Rawling
  • 49,248
  • 7
  • 89
  • 127
1

Interesting problem. You could attack this problem in many ways, but possibly a good idea would be to build up a list of manufacturers from which you could then use to remove from your incoming list strings. Then build a lookup table for all mobile models your concerned about, and do a LINQ select on that table with the model number and manufacturer (which you have previously confirmed). So identifying what is a manufacturer and model number might make things easier for you.

Personally I wouldn't use regex, but build a generic phone model class which you could then use for creating a list. Also if the phone data is inputted manually consider using the Levenshtein algorithm.

wonea
  • 4,783
  • 17
  • 86
  • 139
1

There are definitely more efficient ways to do this, but here's something to start you off.

class Program
{
    private static void Main()
    {
        var ads = new[]
        {
            "Sony Ericsson Arc silver",
            "Sony Ericsson Play R800I",
            "Oneword",
        };

        var feedItems = new[]
        {
            "Sony Ericsson Xperia Arc Silver",
            "Nokia Lumia 900",
            "Sony Ericsson Xperia Play R800i Black",
        };

        var results = from ad in ads
                      from feedItem in feedItems
                      where isMatch(ad, feedItem)
                      select new
                      {
                          AdHeadline = ad,
                          MatchingFeed = feedItem,
                      };

        foreach (var result in results)
        {
            Console.WriteLine(
                "AdHeadline = {0}, MatchingFeed = {1}",
                result.AdHeadline,
                result.MatchingFeed
            );
        }
    }

    public static bool isMatch(string ad, string feedItem)
    {
        var manufacturerWords = new[] { "sony", "ericsson", "nokia" };

        ad = ad.ToLower();
        feedItem = feedItem.ToLower();

        var adWords = Regex.Split(ad, @"\W+").Except(manufacturerWords);
        var feedItemWords = Regex.Split(feedItem, @"\W+").Except(manufacturerWords);

        var isMatch = adWords.Count(feedItemWords.Contains) >= 2;
        return isMatch;
    }
}
Damian Powell
  • 8,655
  • 7
  • 48
  • 58