0

I am processing a text file with 6400 numbers separated by a comma in an 80 x 80 matrix, however I am getting an;

invalid literal for int() with base 10: '' 

What's strange is that this code works on windows but not mac (same python version)

I've tried checking for extra spaces and commas by using try and except to no avail

def read_numbers(file):
    with open(file) as f:
        text = f.read()
        text = text.replace("\n", ",")

    numbers = []
    for s in text.split(','):
        try:
            numbers.append(int(s))
        except ValueError, e:
            print "error",e, "with", "(",s,")"
    return numbers
Po Chen Liu
  • 253
  • 2
  • 12

3 Answers3

2

This text = text.replace("\n", ",") is not needed, neither should you do for s in text.split(',')::

numbers = []
def read_numbers(file):
    with open(file) as f:
        content = f.read()

        # you may also want to remove empty lines
        content = [l.strip() for l in content if l.strip()]

    for s in content:

        try:
            numbers.append(int(s))
        except ValueError:
            print("error with {}".format(s))

    return numbers

EDIT:

I tried to reproduce your problem with the following:

list.txt:

12,43,54,65,7676,87,9898,0909,676,46556

2342,6556,7687
5465,76878,98090,9090,5656


33,434,3435,4545 ,5656

and then:

numbers = []
def read_numbers(file):
    with open(file, 'r') as f:
        content = f.read()
    for s in content.split(","):

        if ',' in s or '\n' in s:
            continue
        else:
            numbers.append(s)

    return numbers


print(read_numbers("list.txt"))

OUTPUT:

['12', '43', '54', '65', '7676', '87', '9898', '0909', '676', '6556', '76878', '98090', '9090', '434', '3435', '4545 ', '5656']
DirtyBit
  • 16,613
  • 4
  • 34
  • 55
1

According to this answer, your source code only supports Window version. So I guess that your s variable at this line numbers.append(int(s)) include newline character on its.

Document here: The key point from that documentation is this

Newline controls how universal newlines works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

  • On input, if the newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • On output, if the newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

Solution:

# When you open a file in text mode in Python 3, It will convert all newlines to '\n' and be done with it.
def read_numbers(file):
    with open(file, 'r') as f:
        text = f.read()
        text = text.replace("\n", ",")

    numbers = []
    for s in text.split(','):
        try:
            numbers.append(int(s))
        except ValueError, e:
            print "error",e, "with", "(",s,")"
    return numbers
Trần Đức Tâm
  • 4,037
  • 3
  • 30
  • 58
1

You might have a trailing , on the end of a line. When the newline is converted into a , this would result in ,,. Your split(',') would then result in a list containing an empty string. This would cause the invalid literal exception.

As an alternative, you could use a regular expression to extract all the integers. This would then automatically cope with possible blank lines or missing values:

import re

def read_numbers(filename):        
    with open(filename) as f:
        return list(map(int, re.findall(r'\d+', f.read())))
Martin Evans
  • 45,791
  • 17
  • 81
  • 97