getting a set of numbers with regex in python

Question

Suppose that I have the folliwing string

string = "serial 7's 93-86-79-72-65 very slow, recall 3/3 "

Now, I want to find the set of numbers using regular expressions in Python. Note that the numbers must be preceded by "serial 7's" I have tried the following:

re.findall('(?<=serial 7\'s )(\d+, )', string)
re.findall('(?<=serial 7\'s )(\d+, )+', string)

Nothing seems to work. Note that there might be unknown number of integers we are trying to extract. I only want numbers with the specific pattern. Not other numbers that might be scattered within the text.

Expected output: ['93','86','79','72','65']

I wish I could accept multiple answers. Thanks everyone! – ssm Jun 29 '20 at 11:14 — ssm, Jun 29 '20 at 11:14

score 5 · Answer 1 · answered Jun 29 '20 at 09:34

Another way to do it using one regular expression:

import re

string = "serial 7's 93-86-79-72-65 very slow, recall 3/3 "

regex = r"(?<=serial 7's) (\d+-?)+"

matches = re.finditer(regex, test_str, re.MULTILINE)

for match in matches:
    integers = match.group(0).strip().split("-")

print(integers) # ['93', '86', '79', '72', '65']

score 4 · Answer 2 · answered Jun 29 '20 at 09:24

4

I would use re.findall here combined with split:

string = "serial 7's 93-86-79-72-65 very slow"
matches = re.findall(r"\bserial 7's (\S+)", string)
nums = matches[0].split('-')
print(nums)

This prints:

['93', '86', '79', '72', '65']

answered Jun 29 '20 at 09:24

Tim Biegeleisen

502,043
27
286
360

May I know what `\S` represents? I am inclined to accept this answer – ssm Jun 29 '20 at 09:30
1

`\S+` will match any continuous collection of non whitespace characters. In this case, it matches the hyphenated numbers word. – Tim Biegeleisen Jun 29 '20 at 09:42

JvdV · Accepted Answer · 2020-06-29T10:11:45.197

My two cents, you could use the below pattern with re.search:

\bserial 7's\s(\d+(?:-\d+)*)

import re
s = "serial 7's 93-86-79-72-65 very slow, recall 3/3 "
res = re.search(r"\bserial 7's\s(\d+(?:-\d+)*)", s)
if res:
    print(res.group(1).split('-')) # ['93', '86', '79', '72', '65']
else:
    print('No match')

I'd check if any match actually occurs first where the pattern must include numbers which, if there are multiple values, are delimited by an hyphen. Since you mentioned: "Note that there might be unknown number of integers we are trying to extract. I only want numbers with the specific pattern.".

\b - Word boundary.
serial 7's - Match "serial 7's" literally.
\s+ - One or more whitespace characters.
( - Open capture group.
\d+ - Match at least a single digit.
(?:-\d+)* - Non-capture group for zero or more times an hyphen followed by at least a single digit.
) - Close capture group.

Alternatively one could use regex module instead and go with a non-fixed width positive lookbehind:

(?<=\bserial 7's\s+(?:\d+-)*)\d+

import regex
s = "serial 7's 93-86-79-72-65 very slow, recall 77 3/3 "
lst = regex.findall(r"(?<=\bserial 7's\s+(?:\d+-)*)\d+", s)
print(lst) # ['93', '86', '79', '72', '65']

(?<= - Start of the positive lookbehind.
- \b - A word boudnary.
- serial 7's - Literally "serial 7's".
- \s+ - One ore more whitespace characters.
- (?: - Open non-capture group.
  - \d+- - Match at least a single digit followed by a hyphen.
  - )* - Close non-capture group and match it zero or more times.
- ) - Close positive lookbehind.
\d+ - Match at least a single digit.

The fourth bird · Answer 4 · 2020-06-29T10:20:07.263

If you can make use of the regex module, you could also use \G and \K

(?:\bserial 7's |\G(?!^))-?\K\d+

Explanation

(?: Non capture group
- \bserial 7's Match serial 7's and space
- | Or
- \G(?!^) The \G anchor matches at 2 positions: at the beginning of the string, or at the end of the previous match. We don't want the match to start at the beginning, so exclude that using a negative lookahead.
)
-?\K Match optional - and reset the match buffer (forget what is matched until now)
\d+ Match 1+ digits

Regex demo | Python demo

Example code

import regex

pattern = r"(?:\bserial 7's |\G(?!^))-?\K\d+"
string = "serial 7's 93-86-79-72-65 very slow, recall 3/3 "

print(regex.findall(pattern, string))

Output

['93', '86', '79', '72', '65']

score 2 · Answer 5 · answered Jun 29 '20 at 09:29

2

Can try:


string = "serial 7's 93-86-79-72-65 very slow"

#Simple regex to find numbers
reg = re.compile("\d+")

#We want to find numbers on as short string as possible.
#So, break the long the string into the part we need to search there..
res = reg.findall(s.split("serial 7's")[1])

print(res)
>> ['93', '86', '79', '72', '65']

answered Jun 29 '20 at 09:29

Aaron_ab

3,450
3
28
42

2

Your solution will _fail_ should the input string contain numbers other than the ones which appear in the `X-Y-Z` string. – Tim Biegeleisen Jun 29 '20 at 10:05

score 0 · Answer 6 · answered Dec 13 '20 at 22:06

Use PyPi regex and capture the numbers:

import regex  # pip install regex
string = "serial 7's 93-86-79-72-65 very slow, recall 3/3 "
pattern = r"serial\s+7's\s+(?:-?(\d+))+"
match = regex.search(pattern, string)
if match:
    print(match.captures(1))
# ['93', '86', '79', '72', '65']

See Python proof

Expression explanation

--------------------------------------------------------------------------------
  serial                   'serial'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  7's                      '7\'s'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (1 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    -?                       '-' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
  )+                       end of grouping

getting a set of numbers with regex in python

6 Answers6