0

I am having a bit of trouble reading data from a CSV file as an integer.

Here is an example of my CSV file:

Col1    Col2    Col3
Header  Header  Header
Header  1       1,000,000
BLANK   2       500,000
BLANK   3       200,000

What I am wanting to do is read in the data from 'Col3' (not including the header) as integers into a tuple.

Here is my code:

import csv
prizePoints = []
with open("csvfile.csv") as prizes:
    next(prizes)
    for row in prizes:
        prizePoints.append(row)

When I try this, I get an error that says:

ValueError: invalid literal for int() with base 10: '"1'

Example of printed row:

['', '1', '1,000,000']    

I think it is due to the values in 'Col3' having commas. However, I am unsure of how to fix this so any help would be greatly appreciated!

P.S. I can't change the format of the values for 'Col3' so that they do not have commas.

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
Sean
  • 157
  • 1
  • 2
  • 14
  • Can you please do `for row in csv.reader(prizes):` then `print(row)` then `break` (i.e. so you get a printout of an example row after it's been processed by `csv.reader()` and then `break`) and give an example row? Presumably you want `1,000,000` to be read as `1000000` integer? – roganjosh Mar 14 '18 at 22:28
  • Hey I've edited the original post to show an example of a printed row :D – Sean Mar 14 '18 at 22:46
  • Thanks for responding to my comment. However, @abanert has highlighted a number of issues that amount to a bit of a perfect storm - you dodge some bullets and hit others. It's worth taking time to address those one-by-one. – roganjosh Mar 14 '18 at 22:47

2 Answers2

3

You've got multiple problems here.

  • In your comprehension, you do for row in prizes. Since prizes is the file object, this consumes all of the remaining lines in the file, while you're processing the first row.
  • The same thing also hides the row from the csv.reader.
  • You're then spliting each row. If you did the csv.reader part right, the row would be a list, not a string, so there'd be nothing to split.
  • Your CSV header appears to be two lines long. You don't skip the header. You're kind of getting lucky that the first error cancels this one out, so you never end up trying to process the header as an int.
  • Then, finally, you'll get to the point where you try to call int on 1,000,000. This one is pretty easy to solve. For a quick&dirty solution, you can just do int(s.replace(',', '').
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Hey thanks for the reply! Sorry I am quite new to using csv files and that with Python so I don't really understand how to fix these problems, like when you say I shouldn't use "for row in csv.reader(prizes):" I don't know of any other way to read data from a csv file – Sean Mar 14 '18 at 22:52
  • @Sean But I didn't say you shouldn't use `for row in csv.reader(prizes):` anywhere. – abarnert Mar 14 '18 at 23:02
  • I guess I just don't understand how to fix these issues that have been highlighted – Sean Mar 14 '18 at 23:11
  • I have tried multiple ways to try and read in the column with the desired data however, I can't figure out how to read in only that column and the data as an integer as oppose to a string, I have edited my original post to show something else I have tried – Sean Mar 14 '18 at 23:14
  • This "quick & dirty solution" is nice in theory, but dangerous in practice. How do you know which locale was used for the file? It's certainly not "guaranteed to be en_US". Especially if that machine is a publicly accessible server. Why not give the user the power to choose? – Peter Gibson Mar 14 '18 at 23:21
  • @PeterGibson There’s no getting around the fact that you can’t parse your data if you don’t know the format. A quick&dirty solution can be useful for interactively experimenting to figure out the format if you don’t have a spec, and it can be the right solution once you know the format and know it will work. Trying to write code that will work for any data in the universe just means you get code that won’t work for anything, except very rarely by accident. This particular file is guaranteed to be en_US, at least as much as it’s guaranteed to be these 3 columns; beyond this file, who knows? – abarnert Mar 14 '18 at 23:28
0

Consider using the locale module when parsing the number - this will help if you run into the problem where some regions switch the thousands and decimal separator

https://stackoverflow.com/a/2954231/66349

Peter Gibson
  • 19,086
  • 7
  • 60
  • 64
  • This is nice in theory, but rarely useful in practice. How do you know which locale to use for the file? It's certainly not "the locale for whatever machine this script happens to be running on". Especially if that machine is a Windows, Mac, typical linux cloud server, or typical linux container US-English system, where `en_US` won't be set so `locale.atoi` will just fall back to calling `int`. – abarnert Mar 14 '18 at 23:09