1

I have a string:

2km739

and I am trying to use a regex to capture the 2739

I know I could just use two capture groups and combine them after (EDIT: or extract the numerical chars after I capture the group), but this would be a little easier in this situation and I am curious if this is possible.

I have this:

([0-9](?=[km])(?<=[km])\d+)

but it doesn't work

it only works if I add the km in there somewhere

([0-9](?=[km])km(?<=[km])\d+)

I would also think this would work, but I learned non-capture groups still get capture but the outside group

([0-9](?:km)\d+)
moto
  • 946
  • 10
  • 27
  • 2
    Can't you just [remove all non-numeric chars](https://stackoverflow.com/questions/1249388/removing-all-non-numeric-characters-from-string-in-python)? – Wiktor Stribiżew Mar 30 '18 at 21:19
  • You can't have holes in capture groups. – Aran-Fey Mar 30 '18 at 21:19
  • @Wiktor I need the regex to find the 2km739, but you are right. I can do that afterward – moto Mar 30 '18 at 21:21
  • Do you need to use regex? Are you just trying to capture the numbers in the sequence they appear? If so you can `''.join([x for x in list('2km739') if x.isnumeric()])` – Aaron Lael Mar 30 '18 at 21:29

2 Answers2

0

If you want to remove all of the letters and capture only digits, you can change the capture group to do that.

(\d+)

You'll need to merge all of the captured groups at the end, as you can't skip over pieces of the input without closing the capture group.

Bricky
  • 2,572
  • 14
  • 30
0

In you regex you use [km] which is the notation for a character class and will match k or m.

Maybe it is an option to capture the groups in a positive lookahead and then join them:

^(?=(\d)km(\d+))

str = "2km739"
reobj = re.compile(r"^(?=(\d)km(\d+))")
match = reobj.search(str)
print ''.join(match.groups())

Demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70