Exclude the middle of a capture group regex

Question

I have a string:

2km739

and I am trying to use a regex to capture the 2739

I know I could just use two capture groups and combine them after (EDIT: or extract the numerical chars after I capture the group), but this would be a little easier in this situation and I am curious if this is possible.

I have this:

([0-9](?=[km])(?<=[km])\d+)

but it doesn't work

it only works if I add the km in there somewhere

([0-9](?=[km])km(?<=[km])\d+)

I would also think this would work, but I learned non-capture groups still get capture but the outside group

([0-9](?:km)\d+)

Can't you just [remove all non-numeric chars](https://stackoverflow.com/questions/1249388/removing-all-non-numeric-characters-from-string-in-python)? — Wiktor Stribiżew, Mar 30 '18 at 21:19
@Wiktor I need the regex to find the 2km739, but you are right. I can do that afterward — moto, Mar 30 '18 at 21:21
Do you need to use regex? Are you just trying to capture the numbers in the sequence they appear? If so you can `''.join([x for x in list('2km739') if x.isnumeric()])` — Aaron Lael, Mar 30 '18 at 21:29

score 0 · Answer 1 · answered Mar 30 '18 at 21:25

If you want to remove all of the letters and capture only digits, you can change the capture group to do that.

(\d+)

You'll need to merge all of the captured groups at the end, as you can't skip over pieces of the input without closing the capture group.

score 0 · Accepted Answer · answered Mar 31 '18 at 11:07

In you regex you use [km] which is the notation for a character class and will match k or m.

Maybe it is an option to capture the groups in a positive lookahead and then join them:

^(?=(\d)km(\d+))

str = "2km739"
reobj = re.compile(r"^(?=(\d)km(\d+))")
match = reobj.search(str)
print ''.join(match.groups())

Demo

Exclude the middle of a capture group regex

2 Answers2