How to say "match anything until a specific character, then work your way backwards"?

Question

I am often faced with patterns where the part which is interesting is delimited by a specific character, the rest does not matter. A typical example:

/dev/sda1       472437724  231650856 216764652  52% /

I would like to extract 52 (which can also be 9, or 100 - so 1 to 3 digits) by saying "match anything, then when you get to % (which is unique in that line), see before for the matches to extract".

I tried to code this as .*(\d*)%.* but the group is not matched:

.* match anything, any number of times
% ... until you get to the litteral % (the \d is also matched by .* but my understanding is that once % is matched, the regex engine will work backwards, since it now has an "anchor" on which to analyze what was before -- please tell if this reasoning is incorrect, thank you)
(\d*) ... and now before that % you had a (\d*) to match and group
.* ... and the rest does not matter (match everything)

You match nothing because the digits are optional. Try using a word boundary or match a space before `^.*\b(\d+)%.*` https://regex101.com/r/niKGIX/1 — The fourth bird, Aug 02 '19 at 14:26
I am not sure it’s relevant in your case, but I had to solve a similar “backwards” problem. What I ended up doing was reversing the string and then writing a regex that operated on the reversed string. Worked very well as the particular data structure was easier to parse right to left. — JL Peyret, Aug 04 '19 at 05:37

score 3 · Accepted Answer · answered Aug 02 '19 at 14:31

Your regex does not work because . matches too much, and the group matches too little. The group \d* can basically match nothing because of the * quantifier, leaving everything matched by the ..

And your description of .* is somewhat incorrect. It actually matches everything until the end, and moves backwards until the thing after it ((\d*).*) matches. For more info, see here.

In fact, I think your text can be matched simply by:

(\d{1,3})%

And getting group 1.

The logic of "keep looking until you find..." is kind of baked into the regex engine, so you don't need to explicitly say .* unless you want it in the match. In this case you just want the number before the % right?

score 2 · Answer 2 · answered Aug 02 '19 at 14:37

If you are just looking to extract just the number then I would use:

import re
pattern = r"\d*(?=%)"
string = "/dev/sda1   472437724  231650856 216764652  52% /"
returnedMatches = re.findall(pattern, string)

The regex expression does a positive look ahead for the special character

The fourth bird · Answer 3 · 2019-08-02T14:48:07.717

In your pattern this part .* matches until the end of the string. Then it backtracks giving up as least as possible till it can match 0+ times a digit and a %.

The % is matched because matching 0+ digits is ok. Then you match again .* till the end of the string. There is a capturing group, only it is empty.

What you might do is add a word boundary or a space before the digits:

.* (\d{1,3})%.* or .*\b(\d{1,3})%.*

Regex demo 1 Or regex demo 2

Note that using .* (greedy) you will get the last instance of the digits and the % sign.

If you would make it non greedy, you would match the first occurrence:

.*?(\d{1,3})%.*

Regex demo

score 1 · Answer 4 · answered Aug 02 '19 at 15:38

By default regex matches as greedily as possible. The initial .* in your regex sequence is matching everything up to the %:

"/dev/sda1       472437724  231650856 216764652  52"

This is acceptable for the regex, because it just chooses to have the next pattern, (\d*), match 0 characters.

In this scenario a couple of options could work for you. I would most recommend to use the previous spaces to define a sequence which "starts with a single space, contains any number of digits in the middle, and ends with a percentage symbol":

' (\d*)%'

score 0 · Answer 5 · answered Aug 02 '19 at 15:29

0

Try this:

.*(\b\d{1,3}(?=\%)).*

demo

answered Aug 02 '19 at 15:29

Mohammad Ali Amini

174
2
2
9

How to say "match anything until a specific character, then work your way backwards"?

5 Answers5