Iterating through tables in text file

Question

everyone.

I would say this is the first task I have not a clear idea where to start with:

Create a text file (using an editor, not necessarily Python) containing two tab- separated columns, with each column containing a number. Then use Python to read through the file you’ve created. For each line, multiply each first number by the second, and then sum the results from all the lines. Ignore any line that doesn’t contain two numeric columns.

so far I wrote a couple of lines, but I am not sure where would I need to go next:

filename = 'path'

def sum_columns(filename):
    sum = 0
    multiply = 0
    with open (filename) as f:

Should I split my file with 2 columns and create a list of them, or should I do something else?

Thank you in advance

**Don't** use `readlines()`, instead use `for line in f: ...` — martineau, Jul 26 '21 at 15:26
Tip: Using `sum` as a variable name is overwriting the builtin `sum()` function which you might need later. — S3DEV, Jul 26 '21 at 15:27
@martineau Why is it preferable to use "for line in f" instead of readlines ? — Achille G, Jul 26 '21 at 15:28
@AchilleG Perhaps because you would store every line in memory, instead of reading them sequentially...but still, if the file is relatively small, you can use readlines. — Nastor, Jul 26 '21 at 15:30
@AchilleG - One reason is efficiency. You’ll end up iterating the list of lines anyways, and in later Python versions, the context manager supports the iteration of `f`, so no need to store all lines in memory. — S3DEV, Jul 26 '21 at 15:30
@Achille G: Because `readlines` reads the entire file into memory whereas `for line in f:` processes them iteratively (one-at-a-time) which is usually preferable. — martineau, Jul 26 '21 at 15:30

score 1 · Accepted Answer · answered Jul 26 '21 at 15:46

Here is a short solution:

def sum_columns(filename):
    counter = 0
    with open(filename) as file:
        for line in file:
            try:
                a, b = [int(x) for x in line.split('\t')]
                counter += a * b
            except ValueError:
                continue
    return counter


file_name = 'myfile.txt'
print(sum_columns(file_name))

This is what a lot of people (@martineau to be the first) suggested to use in comments (also this is something I learned just now) so I decided to put it in an answer.

Basically what happens, the loop iterates over each line and for each line creates a list of two integers (the list comprehension is for just that since otherwise both numbers are strings which will raise a ValueError if you try multiplying them), then also unpack the two values, which is great since then you only need one except since the only reasonable error thrown is ValueError (either because couldn't unpack or character couldn't be converted to integer) then multiply both values and add to the counter and at the end of the loop return the counter

@PreacherBaby it is a tab character, basically it splits by tab character and returns a list of strings where tab character is excluded, therefore if for example there was tab between two numbers and they were the only ones in the string, then it would return a list of those two numbers only — Matiiss, Jul 27 '21 at 12:02

score 0 · Answer 2 · answered Jul 26 '21 at 15:27

0

You can pretty much do a lot of things, given the exercise text. In my opinion, the best way would be to do something like this:

filename = 'path'

def sum_columns(filename):
    sum = 0
    multiply = 0
    with open (filename) as f:
        all_lines = f.readlines()
    f.close()
    for line in all_lines:
        splitted = line.split("\t")
        sum += int(splitted[0]) * int(splitted[1])
    return sum

You'll get all lines of the file listed into all_lines, then you can iterate through every line and split them from the tab, then multiply them and sum them to the sum variable you initialized to 0, which you'll return at the end. As hinted by someone else, you could also read the file line by line without memorizing every line into a list, but if the file is relatively small, you can go with my option.

answered Jul 26 '21 at 15:27

Nastor

638
4
15

Why read the whole file to then iterate on its lines ? Just iterate on the file lines instead... – Loïc Jul 26 '21 at 15:31
Because it doesn't make much difference unless the file you're reading into is very large. – Nastor Jul 26 '21 at 15:33
Yes, so either it's a bit worse, either it's a lot worse. Also you use ```f.close()``` which isn't needed since ```with``` context manager does that for you... – Loïc Jul 26 '21 at 15:40

tituszban · Answer 3 · 2021-07-26T15:39:31.747

If you have a file like this:

1   2
2   4
4   8

You can do the following:

from functools import reduce

def is_int(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

filename = 'path'

def sum_columns(filename):
    with open (filename) as f:
        lines = f.readlines()
    return sum([
        reduce(lambda x, y: x * y, map(int,line.split("\t")))
        for line in lines
        if len(list(filter(is_int, line.split("\t")))) == 2
    ])

Explanation:

At the top I define a helper function, that determins if a string can be converted into an int or not. This will be used later to ignore lines that don't have 2 numbers. It's based on this answer

def is_int(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

Then, we open the file, and read all lines into a variable. This is not the most efficient, as it can be processed line by line without storing the while file, however, for smaller files this is negligable.

with open (filename) as f:
    lines = f.readlines()

Next, is a single operation to perform your query, but let's break it down:

First, we iterate through all the lines:

for line in lines

Next, we only keep the lines that have exactly two numbers separated by tabs:

if len(list(filter(is_int, line.split("\t")))) == 2

Finally, we turn each number in the line into ints, and multiply them all together:

reduce(lambda x, y: x * y, map(int,line.split("\t")))

We then sum all of these and return the result

Performance consideration

If performance is a concern, you can achieve the same thing, reading the contents line by line, instead of pulling the whole file into a variable. It is less elegant, but more efficicient:

def sum_columns(filename):
    total = 0
    with open (filename) as f:
        for line in f:
            if len(list(filter(is_int, line.split("\t")))) != 2:
                continue
            total += reduce((lambda x, y: x * y), map(int,line.split("\t")))
    return total

(Note, that you still need the import and helpers from the above example)

score 0 · Answer 4 · answered Jul 26 '21 at 15:43

input.txt

script.py

with open('input.txt') as f:
  total = 0
  for line in f:
    numbers = line.read().split('\t')
    try:
      line_value = int(numbers[0]) * int(numbers[1])
    except IndexError as e:
      # the line doesn't contain two numbers
      continue
    except ValueError as e:
      # a value couldn't be converted to a number
      continue
    total += line_value

Iterating through tables in text file

4 Answers4

Explanation:

Performance consideration