Given the following dataset, I am trying to find a way to use Regex to pull out city names.
Boston (MA), New York City (NY, CT, NJ)
New York City (NY, CT, NJ), Philadelphia (PA, NJ)
Indianapolis (IN), St. Louis (MO, IL)
St. Louis (MO, IL), Kansas City (MO, KS)
I want the output of the Regex to be:
Boston,New York City
New York City,Philadelphia
Indianapolis,St. Louis
St. Louis,Kansas City
I attempted to pattern match based on two criteria:
(\\w+\\w(?=.())) | (\\w+\\W\\h\\w+(?=.()))
- Cities consisting of letters from
[a-zA-Z]+
such as Boston or Philadelphia - One-word consisting of additional characters such as periods/additional spaces.
The expression accurately matches the first case. However, for the second case, it only matches the first occurrence of St. Louis
.
I also tried the following:
(\\w+ ?\\w(?=.())) | (\\w+\\h\\w+\\h\\w+(?=\\s.()))| (\\w+\\h\\w+(?=\\s.()))
- The first covers the same case as listed above - consisting of one-word cities.
- The third one manages to cover the case of
New York City
, however, just as the first one, fails to recognize cases of the same pattern following that. - And the same case as used in the last pattern, which matched
St. Louis
fails to match, and matchedKansas City
instead.