1

I am searching for city names in a string:

mystring = 'SDM\Austin'
city_search = r'(SD|Austin)'
mo_city = re.search(city_search,mystring,re.IGNORECASE)
city = mo_city.group(1)
print(city)

This will return city as 'SD'.

Is there a way to make 'Austin' the preference?

Switching the order to (Austin|SD) doesn't work.

The answer is the same as How can I find all matches to a regular expression in Python?, but the use case is a little different since one match is preferred.

sparrow
  • 10,794
  • 12
  • 54
  • 74
  • 3
    Switch the order of the options `(Austin|SD)`. Regex will stop at the first match. Otherwise, you can use the `all()` method. – ctwheels Nov 21 '17 at 19:17
  • Have you tried swapping `SD` and `Austin` in the regex? – ForceBru Nov 21 '17 at 19:17
  • I did try that and it didn't work. Sorry I should have mentioned... updating. – sparrow Nov 21 '17 at 19:17
  • It would be helpful to see a bigger sample of the data you are parsing. – Adriano Nov 21 '17 at 19:25
  • I did show the data that I'm parsing. It's "mystring" – sparrow Nov 21 '17 at 19:29
  • 1
    Possible duplicate of [How can I find all matches to a regular expression in Python?](https://stackoverflow.com/questions/4697882/how-can-i-find-all-matches-to-a-regular-expression-in-python). You're using `search`, you need to use `findall` or `finditer` – ctwheels Nov 21 '17 at 19:30
  • 1
    @sparrow there must be a wider application to this because if you only ever want to extract `"Austin"` from `mysting` and assign this to `city`, why not just replace everything with: `city = "Austin"`? – Joe Iddon Nov 21 '17 at 19:31
  • @JoeIddon good point. In the data sometimes I get "SD" which is the city of San Diego so I want to return that, but it also gives false positives so I need to avoid it if I see the name of a full city. It looks like I'll need to go with finall or finditer and code in some logic. – sparrow Nov 21 '17 at 19:34
  • 1
    @sparrow I wrote an answer explaining how if you want to use this `regex`, you must use `re.findall`. – Joe Iddon Nov 21 '17 at 19:35

2 Answers2

1

You're using re.search, instead use re.findall which returns a lists of all matches.

So if you modify your code to:

mystring = 'SDM\Austin'
city_search = r'(SD|Austin)'
mo_city = re.findall(city_search,mystring,re.IGNORECASE)
city = mo_city[1]
print(city)

it will work find, outputting:

Austin

So, mo_city is a list: ['SD', 'Austin'] and since we want to assign the second element (Austin) to city, we take index 1 with mo_city[1].

Joe Iddon
  • 20,101
  • 7
  • 33
  • 54
  • The nice thing about that approach is that I can list the things to search for in order of preference and assign the first match. – sparrow Nov 22 '17 at 04:06
1

Brief

You already have a great answer here (using findall instead of search with regex). This is another alternative (without using regex) that checks a string against a list of strings and returns matches. Based on the sample code you provided, this should work for you and is probably easier than the regex method.

Code

See code in use here

list = ['SD', 'Austin']
s = 'SDM\Austin'
for l in list:
    if l in s:
        print '"{}" exists in "{}"'.format(l, s);
ctwheels
  • 21,901
  • 9
  • 42
  • 77