0

Let's say I have this text:

1.1 This is the 2,1 first 1.2 This is the 2,2 second 1.3 This is the 2,3 third 

and I want:

["1.1 This is the 2,1 first","1.2 This is the 2,2 second","1.3 This is the 2,3 third"]

Note that:

  • I can't use re.findall, since I can't think of a way to properly terminate the match. The best I could think of was '[0-9]+\.[0-9]+^([0-9]+\.[0-9]+)*', which didn't work.

  • I can't just store the delimiter as a global variable, since it changes with each match.

  • I could not use a regular re.split because I want to keep the delimiter. I can't use a lookbehind because it has to be fixed width, and this isn't.

I have read regexp split and keep the seperator, Python split() without removing the delimiter, and In Python, how do I split a string and keep the separators?, and still don't have an answer.

Community
  • 1
  • 1
Ester Lin
  • 607
  • 1
  • 6
  • 20

2 Answers2

2

Yes, you can:

\b\d+\.\d+
.+?(?=\d+\.\d+|$)

See it working on regex101.com. To be used in addition to re.findall():

import re
rx = re.compile(r'\b\d+\.\d+.+?(?=\d+\.\d+|$)')
string = "1.1 This is the 2,1 first 1.2 This is the 2,2 second 1.3 This is the 2,3 third "
matches = rx.findall(string)
print(matches)
# ['1.1 This is the 2,1 first ', '1.2 This is the 2,2 second ', '1.3 This is the 2,3 third ']

If the string spans across multiple lines, use either the dotall mode or [\s\S]*?.
See a demo on ideone.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
0

split with blank whose right is 1.2 2.2 ...

re.split(r' (?=\d.\d)',s)
zxy
  • 148
  • 1
  • 2