Invalid literal for float(): 0.000001, how to fix error?

Question

I have a .csv file containing 3 columns of data. I need to create a new output file that includes a specific set of data from the first and third column from the original file. The third column contains decimal values, and I believe in such a case I have use the float() feature of python. I have tried the following code:

in_file = open("filename.csv", "r")

out_file = open("output.csv", "w")

while True:

    line = in_file.readline()
    if (line == ''): 
        break
    line = line.strip() 
    items = line.split(',') 
    gi_name = items[0] 
    if (gi_name.startswith("_"))
        continue
    p_value = float(items[2]) 
    if (p_value > 0.05):
        continue
    out_file.write(','.join([gene_name, str(p_value)]))
in_file.close()
out_file.close()

when I run the above, I recieve the following error:

Error: invalid literal for float(): 0.000001

the value 0.0000001 is the first value in my data set for the third column, and I guess the code cannot read beyond that set but I'm not sure why. I am new to python, and don't really understand why I am getting this error or how to fix it. I have tried other modifications for how to input the float(), but without success. Does anyone know how I might be able to fix this?

Have you considered using the [`csv` module](http://docs.python.org/library/csv.html)? — Greg Hewgill, Mar 28 '12 at 23:52
Adding a few lines of your CSV file to the question would be helpful for reproduction. — David Alber, Mar 28 '12 at 23:59

score 5 · Accepted Answer · answered Mar 28 '12 at 23:54

5

From what you've posted, it's not clear whether there is something subtly wrong with the string you're trying to pass to float() (because it looks perfectly reasonable). Try adding a debug print statement:

print(repr(items[2]))
p_value = float(items[2])

Then you can determine exactly what is being passed to float(). The call to repr() will make even normally invisible characters visible. Add the result to your question and we will be able to comment further.

answered Mar 28 '12 at 23:54

Greg Hewgill

951,095
183
1,149
1,285

Thank you Greg, when I input the repr(items[2])) it printed the following: '1.10E-06\rGene2' Traceback (most recent call last): File "s6help.py", line 13, in p_value = float(items[2]) so it seems I have a \rGene2 that is hidden in my item[2]. My code has the .strip() function, I thought that would remove the \r and \n. I modified my code to .strip(\r), but it still did not remove it. I don't know what else to do, do have any more ideas? – student001 Mar 29 '12 at 01:09
Well, that's definitely the problem. Note that `.strip()` only removes whitespace from the *ends* of the string, while your `\r` is in the middle of the string. You're now going to have to look at the CSV file format and the code you use to read the file. It's possible that your file might have only `\r` line endings, which isn't supported by default in Python. Does that seem likely? – Greg Hewgill Mar 29 '12 at 01:17
Yes this is possible, and I believe this is the problem. My line endings contain \r, and any attempt to remove them or replace them only results in creating one long line, which is not what I want. Any suggestion on how to remove the \r but still maintain seperate rows? – student001 Mar 29 '12 at 02:13
Use `\n` instead of `\r`. The `\r` by itself is not a usual line terminator. Python normally handles both `\n` and `\r\n` (but `\n` is preferred). – Greg Hewgill Mar 29 '12 at 02:27
Thank you so much! I was able to get the code to work simply by using the 'rU' read argument instead of just 'r', which basically removes the \r issue. Thank you so much, I don't know if I ever would have figured that out on my own! – student001 Mar 29 '12 at 03:19

score 1 · Answer 2 · answered Mar 29 '12 at 00:00

Your file most likely has some unprintable character that is read. Try this:

>>> a = '0.00001\x00'
>>> a
'0.00001\x00'
>>> print(a)
0.00001
>>> float(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for float(): 0.00001

You can see that a has a NUL character which is not printed with either print or the exception of float.

Invalid literal for float(): 0.000001, how to fix error?

2 Answers2

Linked