0

I'm looking to split terms on a delimiter. I'd like to put the number as index and the name as name.

My terms:

The Beehive
12. Bar 821
13. Natives Bar
14. Last Call Bar
15. Scarlet Lounge
16. Linden Room
17. Rooftop 25

I'm using this code:

terms = ['The Beehive', '12. Bar 821', '13. Natives Bar', '14. Last Call Bar', '15. Scarlet Lounge', '16. Linden Room', '17. Rooftop 25']

delim = re.match('\d+\. ', terms)

if delim is None:
    print(delim)
else:
     index = index[:delim.end()]
     name = index[delim.end():]

This fails to capture the split. I've tested it by printing the delim and it doesn't match anything.

Sebastian
  • 957
  • 3
  • 15
  • 27
  • Are you sure you didn't get any error at `delim = re.match('\d+\. ', terms)`. `terms` is a list and not a string – mad_ Feb 06 '19 at 19:34

2 Answers2

2

You are using list as compared to string

import re
terms = ['The Beehive', '12. Bar 821', '13. Natives Bar', '14. Last Call Bar', '15. Scarlet Lounge', '16. Linden Room', '17. Rooftop 25']

delim = re.compile('\d+\.')
for term in terms:
    match = delim.search(term)
    if match:
        print(term[:match.end()]) #index
        print(term[match.end():]) #name
mad_
  • 8,121
  • 2
  • 25
  • 40
  • There's something weird with my text string. It wasn't matching because the space was actually an invisible character `~`. This worked `delim = re.match(r'\d+\.~', terms[i])` – Sebastian Feb 06 '19 at 20:52
  • Except it wasn't a tilde. It appears grey in my text editor and I can't copy and paste it to stack exchange. it appears as nothing when I copy/paste outside the editor – Sebastian Feb 06 '19 at 20:53
0

The match() function accepts only individual strings, so you have to iterate over terms separately:

>>> for term in terms:
...     match = re.match(r'^(?P<index>(\d+\. )?)(?P<name>.*)$', term)  # Return a match object which contains the named groups.
...     index, _, name = match.groups()  # Unpack the groups.
...     # index = match.group('index')
...     # name = match.group('name')
...     print(index, name)
... 
 The Beehive
12.  Bar 821
13.  Natives Bar
14.  Last Call Bar
15.  Scarlet Lounge
16.  Linden Room
17.  Rooftop 25

Also notice the use of groups in the regular expression, which returns a Group object with named matches.

Regarding whether to use the r'' prefix or not, take a look at this question or this excerpt from the docs:

The r prefix, making the literal a raw string literal, is needed […] because escape sequences in a normal “cooked” string literal that are not recognized by Python, as opposed to regular expressions, now result in a DeprecationWarning and will eventually become a SyntaxError. See The Backslash Plague.

Jens
  • 8,423
  • 9
  • 58
  • 78