Matching numbers in strings in regex and converting into integers

Question

I'm trying to match all numbers in a given body of text using re.findall() and convert them to integers. I know that something like [0-9]+ or [\d]+ should match any numbers in the string, however, my output splits numbers up individually (e.g. '125' becomes '1', '2', '5'.

Here's what I have:

import re

regex_list = []

sample = "Here are a bunch of numbers 7746 and 12 and 1929 and 8827 and 7 and 8837 and 128 now convert them"

for line in sample:
    line = line.strip()
    if re.findall('([0-9]+)', line):
        regex_list.append(int(line))
print(regex_list)

Output:

[7, 7, 4, 6, 1, 2, 1, 9, 2, 9, 8, 8, 2, 7, 7, 8, 8, 3, 7, 1, 2, 8]

Desired Output:

[7746, 12, 1929, 8827, 7, 8837, 128]

The problem isn't the regex, the problem is your `for` loop. Take a look at the value of `line`... (That should've been one of the first things to do to debug this problem, by the way.) — Aran-Fey, Mar 26 '18 at 18:12
okay thanks for clarifying, I was not aware that using for loop would have this effect — David, Mar 26 '18 at 20:33

score 3 · Answer 1 · answered Mar 26 '18 at 18:11

Your issue is that you are currently looping through character by character, when you can really just apply the regex to the entire line.

>>> import re    
>>> s = "Here are a bunch of numbers 7746 and 12 and 1929 and 8827 and 7 and 8837 and 128 now convert them"
>>> [int(j) for j in re.findall(r'[0-9]+', s)]
[7746, 12, 1929, 8827, 7, 8837, 128]

score 2 · Answer 2 · answered Mar 26 '18 at 18:14

Have a look at @chrisz's answer for a better solution.

But, if you want to know what's wrong with yours:

Iterating over a string using a for loop gives you single characters, and not words as you thought. To get the words, you'll have to use split().

regex_list = []

sample = "Here are a bunch of numbers 7746 and 12 and 1929 and 8827 and 7 and 8837 and 128 now convert them"

for line in sample.split():
    line = line.strip()
    if re.findall('([0-9]+)', line):
        regex_list.append(int(line))

print(regex_list)
# [7746, 12, 1929, 8827, 7, 8837, 128]

But, since you are getting the words individually, there' no need to use regex. You can directly us isdigit().

for line in sample.split():
    line = line.strip()
    if line.isdigit():
        regex_list.append(int(line))

Or, simply using a list comprehension:

num_list = [int(word) for word in sample.split() if word.isdigit()]
print(num_list)
# [7746, 12, 1929, 8827, 7, 8837, 128]

score 1 · Answer 3 · answered Mar 26 '18 at 18:16

1

for line in sample stores a single character in line, until your sample is a list of lines

answered Mar 26 '18 at 18:16

pratik mankar

126
1
10

Matching numbers in strings in regex and converting into integers

3 Answers3