8

I'm looking to use regex in C# to search for terms and I want to include the plurals of those terms in the search. For example if the user wants to search for 'pipe' then I want to return results for 'pipes' as well.

So I can do this...

string s ="\\b" + term + "s*\\b";
if (Regex.IsMatch(bigtext, s) {  /* do stuff */ }

How would I modify the above to allow me to match, say, 'stresses' when the user enters 'stress' and still work for 'pipe'/'pipes'?

SAL
  • 1,218
  • 1
  • 14
  • 34
  • 1
    Sergi - I hang my head in shame... I shall revisit my old questions and sort it out! sch - Not too bothered about the oddities that English allows... I think that to trap all those would be a very project. – SAL Apr 24 '12 at 11:47

3 Answers3

9

The problem you can face is that there are a lot of irregular nouns such as man, fish and index. So you should consider using the PluralizationService that has a Pluralize method. Here is an example that shows how to use it.

After you get the plural of the term, you can easily construct a regex that searches for both the plural or the singular term.

PluralizationService ps = PluralizationService.CreateService(CultureInfo.CurrentCulture);
string plural = ps.Pluralize(term);
string s = @"("+term+"|"+plural+")";
if (Regex.IsMatch(bigtext, s)) {
    /* do stuff */
}
sch
  • 27,436
  • 3
  • 68
  • 83
  • This is quite important especially when dealing with different cultures! I don't think there is (or should be) a catchall regex. – Nate-Wilkins Oct 21 '14 at 04:28
2

Here's a regex created to remove the plurals:

 /(?<![aei])([ie][d])(?=[^a-zA-Z])|(?<=[ertkgwmnl])s(?=[^a-zA-Z])/g

(Demo & source)

I know it's not exactly what you need, but it may help you find something out.

ThdK
  • 9,916
  • 23
  • 74
  • 101
  • Thanks ThdK - http://gskinner.com/RegExr/ is a brilliant way to test out regex expressions. – SAL Apr 24 '12 at 11:50
  • I just find it recently, never heard about it earlier. It has already a lot of perfect regex's created by the community, and if they are not what you're looking for, you can modify them on the fly :) – ThdK Apr 24 '12 at 11:56
  • I agree with @JimMischel. My answer is probably not the best here. I just wanted to help where i could, you know :) – ThdK Apr 24 '12 at 14:13
0

If you are using SQL server as your backend couldn't you utilize Soundex? I am unsure what you are trying to search for. I assume you are trying to create dynamic SQL as search input. If not I think there is SoundEx for LINQ.

EDIT: I stand corrected, it appears there is some linq to sql entity stuff that can be done for SoundEx.

However, MSDN does have a soundex example, which for the simple tests I ran this morning seems to do fine as far as what I tested. http://msdn.microsoft.com/en-us/library/bb669073.aspx

The change I made was instead of .ToUpper(invariant) i used .ToUpperInvariant() and instead of passing (string word) i used an extension method (this string word)

Here is an example of what I ran

List<string> animals = new List<string>();
animals.Add("dogs");
animals.Add("dog");
animals.Add("cat");
animals.Add("rabbits");
animals.Add("doggie");

string dog = "dog";
var data = from animal in animals
where animal.SoundEx() == dog.SoundEx()
select animal;

data : dogs, dog, doggie

Now with SQL server, using the Contains/FreeText/ContainsTable etc and using SoundEx against a catalog (I am not familiar with the newer versions of SQL server - going back to SQLServer 2000 implementation I used), you could also rank your results.

Also if you have the ability to use sql server you may want to look into this option: LINQ to SQL SOUNDEX - possible?

The concern with the Pluralization solution, you must be able to utilize .Net 4.

There is also the Levenshtein distance algorithm that may be useful.

Community
  • 1
  • 1
Adam W.
  • 31
  • 3
  • Welcome to Stack Overflow! How about providing more substance to your answer in the form of a working example of the technique you're suggesting? – Greg Bacon Apr 24 '12 at 14:30