I have a list of every city in the world in my Database, and have an application written in C# that needs to search an incoming string to determine whether any of my cities exist in that string. However, I'm having issues figuring out the Reg pattern because some cities are TWO words like "San Francisco". Thanks for any help in advance.
-
3Are there no cities that are three words?! – Mark Byers Feb 10 '12 at 15:21
-
And what does [NSRegularExpression](https://developer.apple.com/library/mac/#documentation/Foundation/Reference/NSRegularExpression_Class/Reference/Reference.html) have to do with C#? – Mark Byers Feb 10 '12 at 15:22
-
little more of an example...what are "my cities"? – circusdei Feb 10 '12 at 15:22
-
1There are also words that are city names. Anyway, I don't think regex are the correct tool for your problem. – Peter Feb 10 '12 at 15:23
-
As Mark said: http://en.wikipedia.org/wiki/Truth_or_Consequences,_New_Mexico . You might need to allow three or more words. – DRobinson Feb 10 '12 at 15:50
-
Sorry about the NSRegularExpression tag, I was not paying attention. Anyway, you are all correct there could be three words for a city. I'm researching Regex combinations and still coming up short. – Ryan Lege Feb 11 '12 at 11:02
1 Answers
Probably the easiest way is to create an array of all your cities in memory (select name from cities
) and then use regex or simple string methods to see if these cities are found in the text.
List<string> cities = GetCitiesFromDatabase(); // need to implement this yourself
string text = @"the text containign city names such as Amsterdam and San Francisco";
bool containsACity = cities.Any(city => text.Contains(city)); //To search case insensitive, add StringComparison.CurrentCultureIgnoreCase
IEnumerable<string> containedCities = cities.Where(city => text.Contains(city));
To ensure that 'Amsterdam' wouldn't match on 'Amsterdamned', you could use a regular expression instead of Contains:
bool containsACity = cities.Any(city => Regex.IsMatch(text, @"\b"+Regex.Escape(city))+@"\b")
// Add RegexOptions.IgnoreCase for case insensitive matches.
IEnumerable<string> containedCities = cities.Where(city => Regex.IsMatch(text, @"\b"+Regex.Escape(city))+@"\b");
Alternatively, you can build a large regular expression to search for any city and execute that once:
string regex = @"\b(?:" + String.Join("|", cities.Select(city => Regex.Escape(city)).ToArray()) + @")\b"
bool containsACity = Regex.IsMatch(text, regex, RegexOptions.IgnoreCase);
IEnumerable<string> containedCities = Regex.Matches(text, regex, RegexOptions.IgnoreCase).Cast<Match>().Select(m => m.Value);
You can improve the performance of these calls by caching the list of cities or caching the regular expression (and improve even further by creating a static readonly Regex object with RegexOptions.Compiled).
Another solution would be to calculate this in the database, instead of storing a local list of cities in memory, send the input to the database and use a LIKE statement or Regex inside the database to compare the list of cities against the text. Depending on the number of cities and the size of the text this might be a faster solution, but whether or not this is possible depends on the database being used.

- 106,458
- 22
- 256
- 341