-2

I'm using regex to split a string with a specific format but I'm still missing one number.

The string format is like this: "22TKL;33TKL;22FBL;35TKL". I'd like to receive as output pairs containing number and substrings like: (22, TKL), (33, TKL), (22, FBL), (35, TKL).

Using regex101 to create the expression, I came up with this: "[^;]+([0-9]+)([a-zA-z]+)". And according to regex101, the grouping is like this:

enter image description here

So, I'm unable to match the first number and don't know how to solve this.

  • Why not just split the string on semi-colons and extract the last 3 characters ? `[ (part[:-3],part[-3:]) for part in string.split(";") ]` – Alain T. Mar 14 '23 at 20:10

2 Answers2

1

Just ([0-9]+)([a-zA-z]+) works fine https://regex101.com/r/sMxFQX/2

It didnt work for you since [^;]+ consumed the number, cause the number is a charcter thats not ;

Alternatively, if you do need this check, you can use this [^;]*?([0-9]+)([a-zA-z]+)

The ? I added makes it a lazy quantifier, meaning it consumes as few characters as it can, thus not matching the first digit if it doesn't have to

Ron Serruya
  • 3,988
  • 1
  • 16
  • 26
0

If you are looking to get a list of lists as output to maintain the separation of the semicolons, you could consider splitting first by the semi-colon, and then throwing regex at each element:

list_of_lists = [re.findall(r'\d+|\D+', x) for x in  "22TKL;33TKL;22FBL;35TKL".split(";")]


[['22', 'TKL'], ['33', 'TKL'], ['22', 'FBL'], ['35', 'TKL']]

This keeps the regex nice and simple and gives you more workable output since the terms are maintained in their own list. This just says to look for either a set of numbers or a set of a non-numerics.

JNevill
  • 46,980
  • 4
  • 38
  • 63