Finding and extracting multiple substrings in a string?

Question

After looking a few similar questions, I have not been able to successfully implement a substring split on my data. For my specific case, I have a bunch of strings, and each string has a substring I need to extract. The strings are grouped together in a list and my data is NBA positions. I need to pull out the positions (either 'PG', 'SG', 'SF', 'PF', or 'C') from each string. Some strings will have more than one position. Here is the data.

text = ['Chi\xa0SG, SF\xa0\xa0DTD','Cle\xa0PF']

The code should ideally look at the first string, 'Chi\xa0SG, SF\xa0\xa0DTD', and return ['SG','SF'] the two positions. The code should look at the second string and return ['PF'].

can you add complete expected output for clarity? for ex: is this what you are looking for? `[re.findall(r'\b(PG|SG|SF|PF|C)\b', s) for s in text]` — Sundeep, Oct 17 '16 at 04:55

score 2 · Accepted Answer · answered Oct 17 '16 at 04:48

Leverage (zero width) lookarounds:

(?<!\w)PG|SG|SF|PF|C(?!\w)

(?<!\w) is zero width negative lookbehind pattern, making sure the desired match is not preceded by any alphanumerics
PG|SG|SF|PF|C matches any of the desired patterns
(?!\w) is zero width negative lookahead pattern making sure the match is not followed by any alphanumerics

Example:

In [7]: s = 'Chi\xa0SG, SF\xa0\xa0DTD'

In [8]: re.findall(r'(?<!\w)PG|SG|SF|PF|C(?!\w)', s)
Out[8]: ['SG', 'SF']

why not use word boundary? `r'\b(PG|SG|SF|PF|C)\b'` – Sundeep Oct 17 '16 at 04:51 — Sundeep, Oct 17 '16 at 04:51

Thomas Kilkelly · Answer 2 · 2016-10-17T05:16:54.403

0

heemayl's response is the most correct, but you could probably get away with splitting on commas and keeping only the last two (or in the case of 'C', the last) characters in each substring.

s = 'Chi\xa0SG, SF\xa0\xa0DTD'
fin = list(map(lambda x: x[-2:] if x != 'C' else x[-1:],s.split(',')))

I can't test this at the moment as I'm on a chromebook but it should work.

edited Oct 17 '16 at 05:16

answered Oct 17 '16 at 04:52

Thomas Kilkelly

125
10

I currently have no way to test it but I gave it a go anyway – Thomas Kilkelly Oct 17 '16 at 05:18

Finding and extracting multiple substrings in a string?

2 Answers2