I am given a csv file which contains numbers ranging from 800 to 3000. The problem is numbers greater than thousand has a comma in them e.g. 1,227 or 1,074 or 2,403. When I want to calculate their mean, variance or standard deviation using scipy or numpy, I get error: ValueError: could not convert string to float: '1,227'. How convert them to numbers so that I could do calculations on them. CSV file should not be changed as it is read only file.
Asked
Active
Viewed 1,580 times
0
-
You haven't shown any code. Theres loads of ways to do this, depending on your actual approach when reading the csv – roganjosh Oct 07 '17 at 18:35
-
This isn't a formatting issue but rather a reading issue - how to load a `csv` into an array. https://stackoverflow.com/questions/6633523/how-can-i-convert-a-string-with-dot-and-comma-into-a-float-number-in-python has `replace` and `locale` solutions. – hpaulj Oct 07 '17 at 19:21
-
How about writing a new version of the file without commas? `tr -d ',' < originalFile.csv > noCommas.csv`? – Mark Setchell Oct 07 '17 at 21:14
-
my_string=[val[2] for val in csvfile] my_float=[float(my_string.replace(',', '')) for i in my_string)] this is what I am trying to do. So my_string has string list. e.g. numbers with comma. I am trying to convert to my_float where replace would have worked. Since it is a list of strings, this code is not working. – Said Akbar Oct 07 '17 at 23:04
2 Answers
1
Thanks, guys! I fixed it by using replace function. hpaulj's link was useful.
my_string=[val[2] for val in csvtext]
my_string=[x.replace(',', '') for x in my_string]
my_float=[float(i) for i in my_string]
This is the code, in which, 1st line loads csv string list to my_string and 2nd line removes comma and 3rd line produces numbers that are easy for calculation. So, there is no need for editing the file or creating a new one. Just a list manipulation will do the job.

Said Akbar
- 423
- 1
- 3
- 15
0
This really is a locale
issue, but a simple solution would be to simply call replace
on the string first:
a = '1,274'
float(a.replace(',','')) # 1274.0
Another way is to use pandas
to read the csv file. Its read_csv
function has a thousands
argument.
If you do know something about the locale, then it's probably best to use the locale.atof()
function

Bart Van Loon
- 1,430
- 8
- 18
-
Not if you use numpy to read in the CSV, or even the base CSV module. You need clarification from OP to hope to answer this. – roganjosh Oct 07 '17 at 18:36
-
I agree. The question isn't very clear. However, the ValueError message does indicate that he is dealing with numbers as strings. – Bart Van Loon Oct 07 '17 at 18:38
-
Then don't shoot for an answer. Ask for clarification first. Rep gain is secondary to providing something that's useful. – roganjosh Oct 07 '17 at 18:38
-
I found an old SO question that gives essentially these two answers. But if `pandas` is available, then I'd use that. – hpaulj Oct 07 '17 at 19:24