2

Task: To find all the numbers in a text file and compute the sum of it.

Link to file(if required) : http://python-data.dr-chuck.net/regex_sum_42.txt

name = raw_input("Enter your file: ")
if len(name) < 1: name = "sample.txt"

try: 
    open(name)
except:
    print "Please enter a valid file name."
    exit()

import re
lst = list()
for line in name:
    line = line.strip()  #strip() instead of rstrip() as there were space before line as well
    stuff = re.findall("[0-9]+", line)
    print stuff               # i tried to trace back and realize it prints empty list so problem should be here
    stuff = int(stuff[0])     # i think this is wrong as well 
    lst.append(stuff)
    sum(lst)

print sum(lst)

Can someone tell me where did I go wrong ? sorry for any formatting errors and thanks for the help

I have also tried:

\s[0-9]+\s
.[0-9]+.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
ming
  • 41
  • 3
  • Your first regex is correct (you could even change it to `(\d+)`, however, you need a global modifier and a capturing group to save the found numbers. See [this regex101 demo](https://regex101.com/r/mA9jV8/2). – Jan Dec 10 '15 at 11:38
  • No, OP does not have to use any capture groups. `re.findall` is performing a global search. I get *IndexError: list index out of range* error. – Wiktor Stribiżew Dec 10 '15 at 11:47
  • @stribizhev: Ok, did not know that the re module handles this automatically. I'm more of a PHP guy where `preg_match_all()` needs capturing groups. – Jan Dec 10 '15 at 11:49
  • @Jan: `preg_match_all` does not require capturing groups either :) – Wiktor Stribiżew Dec 10 '15 at 11:59
  • Where do you read from the file? At the moment `name` only contains the actual *filename* and not the *content*, thus line does not hold the content you're after. – Jan Dec 10 '15 at 12:52
  • Shortcut: it should be 597873. No program needed ;-) – Jan Dec 10 '15 at 14:01

1 Answers1

1

You need to change your code to:

lst = []
with open(name) as f:
    for line in f:
        stuff = [lst.append(int(x)) for x in re.findall("[0-9]+", line.strip())]
print sum(lst)

See the IDEONE demo

The problem was that you tried to parse an empty string in the first place. When parsing to int and appending to the list (declared with lst = []) inside comprehension, you avoid messing with empty output and the list you get is flattened automatically.

Also, you need to actually read the file in. "The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered IO and memory management so you don't have to worry about large files." (source)

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I agree to you on this one but if this is the OP's original code (and not just edited somehow) where does he *actually* read from the file? At the moment, the file is only opened but nothing happens afterwards so `line` is always empty. – Jan Dec 10 '15 at 12:51
  • I mean this should be used when the file is actually read of course. – Wiktor Stribiżew Dec 10 '15 at 12:56
  • I was just thinking loud, maybe the error lies somewhere else. – Jan Dec 10 '15 at 12:56
  • @Jan: Thank you for spotting that, I was just focusing on the actual issue with parsing the contents and overlooked file reading issue. – Wiktor Stribiżew Dec 10 '15 at 13:01