2

I have a string like this: ape4banana3 and I split it like this:

>>>re.split('([1-5]?)|\s', "ape4banana3")
['ape', '4', 'banana', '3', '']

Why do I get the trailing '' in my result? Can I get rid of it by writing a smarter regex?

Side note: The regex has the alternation because sometimes the string looks like this: ape4 banana3 and then I want to lose the whitespace.

For extra credit: Is there a way I can get this result instead? ['ape4', 'banana3']?

Fylke
  • 1,753
  • 3
  • 19
  • 30

2 Answers2

4

You're seeing the extra space because you're splitting on digits and so you get the empty string after your last digit.

As to your extra credit, this seems like the easiest solution:

>>> re.findall(r"([a-zA-Z]+[1-5]+)", "ape4banana3")
['ape4', 'banana3']
>>> re.findall(r"([a-zA-Z]+[1-5]+)", "ape4 banana3")
['ape4', 'banana3']

You might need to replace [a-zA-Z] with a more specific or less specific pattern depending on your use case, this regex is only based on the strings you've posted here.

Nolen Royalty
  • 18,415
  • 4
  • 40
  • 50
  • What's the significance of the 'r' before the regex string? – Fylke Oct 12 '13 at 00:42
  • 3
    @Fylke the `r` marks the string as a [raw string](http://docs.python.org/2/library/re.html#raw-string-notation), which means that you don't need to escape backslashes. It's not technically needed in this case, but I do it out of habit as most regexes end up needing something like a `\d` or `\w` – Nolen Royalty Oct 12 '13 at 00:43
  • 1
    (although to be honest in this case it's because I'm really lazy and used `\d` to write my solution because it's easier to type and then replaced it with a `[1-5]` after posting :p – Nolen Royalty Oct 12 '13 at 00:47
0

This is because the 3 is splitting banana and an empty strung at the end.

As for the second result, what about can you just splitting on \s?

edit Oh I see, the space is not always there.

You can match like:

 ([A-Za-z].?[1-5])([A-Za-z].?[1-5])

The parentheses are used to put each enclosed section into a group that will be contained as its own array cell.

Neil Neyman
  • 2,116
  • 16
  • 21