Python regex match any number of digits not immediately followed by period

Question

I have a list of multi-row strings. I want to match first rows of those strings if they start with a variable number of digits NOT immediately followed by a period.

For example, a list might be

list = ["42. blabla \n foo", "42 blabla \n foo", "422. blabla \n foo"]

and my desired output would be 42 blabla.

This code

import re 

list = ["42. blabla \n foo", "42 blabla \n foo", "422. blabla \n foo"]

regex_header = re.compile("^[0-9]+(?!\.).*\n")

for str in list:
    print(re.findall(regex_header, str))

outputs

['42. blabla \n']
['42 blabla \n']
['422. blabla \n']

This one works only with exactly two digits in the beginning of the string:

import re 

list = ["42. blabla \n foo", "42 blabla \n foo", "422. blabla \n foo"]

regex_header = re.compile("^[0-9]{2}(?!\.).*\n")

for str in list:
    print(re.findall(regex_header, str))

Output:

[]
['42 blabla \n']
['422. blabla \n']

score 2 · Accepted Answer · answered Jul 05 '19 at 16:49

2

You need (?![.\d]) lookahead:

r"^\d+(?![.\d])"

See the regex demo. Details:

^ - start of string
\d+ - 1+ digits
(?![.\d]) - no dot and any other digits are allowed to the right of the current location.

See the Python demo:

import re 
l = ["42. blabla \n foo", "42 blabla \n foo", "422. blabla \n foo"]
regex_header = re.compile(r"^[0-9]+(?![.\d])")
for s in l:
    if (regex_header.search(s)):
        print(s)
# => "42 blabla \n foo"

answered Jul 05 '19 at 16:49

Wiktor Stribiżew

607,720
39
448
563

Just wondering re the `.` inside the `[ ]` -- do we not need to escape it here to `\.`? I don't quite understand why `\d` works as expected in the brackets, while things seem to be different for the `.`? Thanks! – patrick Jul 05 '19 at 16:55
2

@patrick Inside a character class, only ``\``, `-`, `^` and `]` should be escaped. The rest is treated as literal chars. `[.]` = `r'\.'`. See [What special characters must be escaped in regular expressions?](https://stackoverflow.com/a/400316/3832970) – Wiktor Stribiżew Jul 05 '19 at 16:58

Emma · Answer 2 · 2019-07-05T16:48:40.060

0

My guess is that maybe this might be what we might want to output:

import re 

list = ["42. blabla \n foo", "42 blabla \n foo", "422. blabla \n foo"]

regex_header = re.compile("^[0-9]+(?!\.)\D*$")

for str in list:
    print(re.findall(regex_header, str))

Demo

edited Jul 05 '19 at 16:48

answered Jul 05 '19 at 16:42

Emma

27,428
11
44
69

1

Perfect -- this works as desired, and replacing ```$``` with ```\n``` will return only the first row. – Sal Jul 05 '19 at 16:57
1

EDIT: this works as desired, unless the line contains digits further away from the beginning (e.g. if we replace the second string in list with ```"42 blabla 00 \n foo"```). – Sal Jul 05 '19 at 17:14

Python regex match any number of digits not immediately followed by period

2 Answers2

Demo