Unexpected behavior iterating three IF statements over a list in Python

Question

I'm trying to iterate over items in a list generated by split() in python 3.4, and I can't understand why it's not working like I expect it to. Here's the code:

seqdes = '48 Marshall McDonald advances to 1st (single), 43 Nicholas Boggan advances to 2nd (48), 48 Marshall McDonald advances to 2nd (wild pitch), 43 Nicholas Boggan advances to 3rd (wild pitch)'
firstbaselist = []
secondbaselist = []
thirdbaselist = []

for item in seqdes.split(','):
    if re.compile('.*advances to 1st.*').match(item):
        firstbaselist.append(re.compile('\d\d').match(item).group(0))
    if re.compile('.*advances to 2nd.*').match(item):
        secondbaselist.append(re.compile('\d\d').match(item).group(0))
    if re.compile('.*advances to 3rd.*').match(item):
        thirdbaselist.append(re.compile('\d\d').match(item).group(0))

I expected this to look at each of the four things created by seqdes.split(',') and if it found the regex match, append the two digits found at the start of each line to the designated list. Instead, I get:

Traceback (most recent call last):
  File "<stdin>", line 5, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

So I see that the code is trying to run the secondbaselist.append piece on an item from the seqdes.split list that doesn't contain "advances to 2nd" anywhere, but I don't know why it's doing that. Since the if statement is false there, I wouldn't think it would try the append part; obviously then I'm not getting the desired behavior from the if statements, but I don't understand why.

I have also tried this with if item.find("advances to 1st"), etc, with no change. What am I missing?

You have a typo: `for item in seqdes.split(',')"` has a `"` at the end instead of a `:`. (That's not your error) — A.J. Uppal, Mar 23 '15 at 21:50
Heh - yep. Fixed it. But yeah, that was a transcription error from the python window to here, not the source of the issue. — John C, Mar 23 '15 at 21:54
You need to have a parentheses group to capture, and also you should use raw strings: `re.compile(r'(\d\d)')....`. Not sure why you are compiling all the time, what is wrong with `re.search()`? Come to think of it, you don't really need an RE for the `if` condition, `in` should do it. — cdarke, Mar 23 '15 at 22:13
I agree that the RE wasn't necessary in the IF condition, but trying a non-regex alternative, item.find(""), didn't work either. I have literally no experience with regex prior to this, so it wouldn't surprise me if I was doing something wrong and/or inefficiently with regards to that part of it. The solution was, in fact, to use `re.search` instead of `re.match` - the answer below was exactly what I needed. Thanks for reading my question, though! — John C, Mar 23 '15 at 22:26

score 1 · Accepted Answer · edited May 23 '17 at 11:56

The error is because you using re.match instead of re.search. The difference between re.match and re.search is explained here: What is the difference between Python's re.search and re.match?

The reason for your error is this line in the re.match docs:

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

When you split your string, the second item in your string is ' 43 Nicholas Boggan advances to 2nd (48)' which has a space to begin with. Since that space is not part of your regex, re.match fails and returns None. So the line

secondbaselist.append(re.compile('\d\d').match(item).group(0))

becomes None.group(0) and None type object has no attribute group.

Using re.search should fix this.

This worked perfectly - thanks! To confirm that this was the solution I also tested it by using re.match(item.strip()), which took off the leading space, and that was successful as well. — John C, Mar 23 '15 at 22:23

score 1 · Answer 2 · answered Mar 23 '15 at 22:25

Try this:

import re
seqdes = '48 Marshall McDonald advances to 1st (single), 43 Nicholas Boggan advances to 2nd (48), 48 Marshall McDonald advances to 2nd (wild pitch), 43 Nicholas Boggan advances to 3rd (wild pitch)'
firstbaselist = []
secondbaselist = []
thirdbaselist = []

for item in seqdes.split(','):
    if 'advances to 1st' in item:
        firstbaselist.append(re.search(r'(\d\d)',item).group(0))
    elif 'advances to 2nd' in item:
        secondbaselist.append(re.search(r'(\d\d)',item).group(0))
    elif 'advances to 3rd' in item:
        thirdbaselist.append(re.search(r'(\d\d)',item).group(0))

print firstbaselist
print secondbaselist
print thirdbaselist

Gives:

['48']
['43', '48']
['43']

Unexpected behavior iterating three IF statements over a list in Python

2 Answers2