0

With a list of ints created from re.findall(), how can we create a sum for the ints?

Example file:

A number: 606
Another number could be 52 or 38
Another number below:
55

Prints a list created with RegEx captures:

import re
fileHandle = open("sample.txt", "r")
for line in fileHandle:
    #RegEx: Match all ints anywhere in line
    num = re.findall("\d+", line)
    print(num)

Output:

['606']
['52', '38']
[]
['55']
Reubens4Dinner
  • 343
  • 4
  • 15

3 Answers3

1

You have to convert the items of the (nested) list from str to int. You can do so in a single nested generator expression using the sum builtin function:

>>> sum(int(x) for line in filehandle for x in re.findall(r"\d+", line))    
751

Or without nesting, using read() to get the entire content of the file (if it's not too big):

>>> sum(int(x) for x in re.findall(r"\d+", filehandle.read()))             
751

Or using map instead of a generator expression:

>>> sum(map(int, re.findall(r"\d+", filehandle.read())))                   
751

Or if you want the sums per line (map version left as an exercise to the reader):

>>> [sum(int(x) for x in re.findall(r"\d+", line)) for line in filehandle] 
[606, 90, 0, 55]

(When you try those in the interactive shell, remember that the file will be "exhausted" after each of those, so you will have to re-open the file before testing the next one. Also note that using \d+ you might get surprising results if your file contains e.g. floating point numbers or IP addresses.)

tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • I wonder which one would be faster – RomanPerekhrest Jun 27 '19 at 14:42
  • 1
    @RomanPerekhrest Given that the input is read from a file, I doubt that it matters much. Without the file-reading part (using a multiline string or a list of strings respectively), the double-for-generator is a bit slower than the other two. – tobias_k Jun 27 '19 at 14:43
  • Good point on irregularities with floats and IPs. A capture that matches a pattern containing a maximum of one decimal might do for that, unless I'm missing something. – Reubens4Dinner Jun 27 '19 at 14:50
  • 1
    @Reubens4Dinner It might be a bit more complicated, see e.g. [here](https://stackoverflow.com/a/385597/1639625). – tobias_k Jun 27 '19 at 14:54
  • 1
    @Reubens4Dinner Oh, and of course, if you capture floating point numbers, remember to convert to `float` instead of `int`. – tobias_k Jun 27 '19 at 15:10
0

Working code with explanations in comments

import re
sumInts = 0
fileHandle = open("sample.txt", "r")
for line in fileHandle:
    #RegEx: Match all ints anywhere in line
    num = re.findall("\d+", line)
    #Cast list entries to ints (might not be needed?)
    num = [int(i) for i in num]
    #Sums all ints from RegEx capture
    for i in num:
        sumInts = sumInts + i
print(sumInts)
Reubens4Dinner
  • 343
  • 4
  • 15
0

My preferred regex solution is to use an iterator and compute the sum as we parse the input string:

input = """A number: 606
           Another number could be 52 or 38
           Another number below:
           55"""

sum = 0

for match in re.finditer("\d+", input):
    sum = sum + int(match.group())

print("sum is: " + str(sum))

This prints:

sum is: 751
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360