ValueError: could not convert string to float: id

Question

I'm running the following python script:

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    list1=[float(x) for x in l1]
    list2=[float(x) for x in l2]
    result=stats.ttest_ind(list1,list2)
    print result[1]

However I got the errors like:

ValueError: could not convert string to float: id

I'm confused by this. When I try this for only one line in interactive section, instead of for loop using script:

>>> from scipy import stats
>>> import numpy as np
>>> f=open('data2.txt','r').readlines()
>>> w=f[1].split()
>>> l1=w[1:8]
>>> l2=w[8:15]
>>> list1=[float(x) for x in l1]
>>> list1
[5.3209183842, 4.6422726719, 4.3788135547, 5.9299061614, 5.9331108706, 5.0287087832, 4.57...]

It works well.

Can anyone explain a little bit about this? Thank you.

This kind of error `ValueError: could not convert string to float: ` can occur when reading a dataframe from a `csv` file with types as `df = df[['p']].astype({'p': float})`. If the `csv` was recorded with empty spaces, python will not recognize the space character as a nan. You will need to overwrite empty cells with NaN with `df = df.replace(r'^\s*$', np.nan, regex=True)` — Alfred Wallace, Apr 21 '21 at 14:55

Anurag Uniyal · Accepted Answer · 2015-01-04T21:07:27.857

Obviously some of your lines don't have valid float data, specifically some line have text id which can't be converted to float.

When you try it in interactive prompt you are trying only first line, so best way is to print the line where you are getting this error and you will know the wrong line e.g.

#!/usr/bin/python

import os,sys
from scipy import stats
import numpy as np

f=open('data2.txt', 'r').readlines()
N=len(f)-1
for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
        list1=[float(x) for x in l1]
        list2=[float(x) for x in l2]
    except ValueError,e:
        print "error",e,"on line",i
    result=stats.ttest_ind(list1,list2)
    print result[1]

score 36 · Answer 2 · edited Jun 19 '18 at 07:07

36

My error was very simple: the text file containing the data had some space (so not visible) character on the last line.

As an output of grep, I had 45 instead of just 45.

edited Jun 19 '18 at 07:07

Zoe

27,060
21
118
148

answered Nov 13 '15 at 21:01

Sopalajo de Arrierez

3,543
4
34
52

2

Spaces and tabs are visible ;) End-of-lines and alikes are not, for example, characters `\n`,`\r`. – Oleg Melnikov Dec 09 '17 at 23:30
I guess this is the point in time when most people figure out that [Lib/re.py](https://docs.python.org/3.6/library/re.html) and .replace(' ', '') exist. – Ole Aldric May 31 '18 at 11:14

score 23 · Answer 3 · answered Dec 07 '11 at 17:59

This error is pretty verbose:

ValueError: could not convert string to float: id

Somewhere in your text file, a line has the word id in it, which can't really be converted to a number.

Your test code works because the word id isn't present in line 2.

If you want to catch that line, try this code. I cleaned your code up a tad:

#!/usr/bin/python

import os, sys
from scipy import stats
import numpy as np

for index, line in enumerate(open('data2.txt', 'r').readlines()):
    w = line.split(' ')
    l1 = w[1:8]
    l2 = w[8:15]

    try:
        list1 = map(float, l1)
        list2 = map(float, l2)
    except ValueError:
        print 'Line {i} is corrupt!'.format(i = index)'
        break

    result = stats.ttest_ind(list1, list2)
    print result[1]

Contango · Answer 4 · 2021-03-12T12:08:46.470

17

For a Pandas dataframe with a column of numbers with commas, use this:

df["Numbers"] = [float(str(i).replace(",", "")) for i in df["Numbers"]]

So values like 4,200.42 would be converted to 4200.42 as a float.

Bonus 1: This is fast.

Bonus 2: More space efficient if saving that dataframe in something like Apache Parquet format.

edited Mar 12 '21 at 12:08

answered Mar 12 '21 at 11:49

Contango

76,540
58
260
305

score 8 · Answer 5 · answered Mar 02 '18 at 06:53

8

Perhaps your numbers aren't actually numbers, but letters masquerading as numbers?

In my case, the font I was using meant that "l" and "1" looked very similar. I had a string like 'l1919' which I thought was '11919' and that messed things up.

answered Mar 02 '18 at 06:53

Tom Roth

1,954
17
25

score 7 · Answer 6 · answered Dec 07 '11 at 18:02

Your data may not be what you expect -- it seems you're expecting, but not getting, floats.

A simple solution to figuring out where this occurs would be to add a try/except to the for-loop:

for i in range(0,N):
    w=f[i].split()
    l1=w[1:8]
    l2=w[8:15]
    try:
      list1=[float(x) for x in l1]
      list2=[float(x) for x in l2]
    except ValueError, e:
      # report the error in some way that is helpful -- maybe print out i
    result=stats.ttest_ind(list1,list2)
    print result[1]

score 5 · Answer 7 · answered Apr 26 '21 at 13:46

5

Shortest way:

df["id"] = df['id'].str.replace(',', '').astype(float) - if ',' is the problem

df["id"] = df['id'].str.replace(' ', '').astype(float) - if blank space is the problem

answered Apr 26 '21 at 13:46

João Vitor Gomes

317
3
12

score 2 · Answer 8 · answered Nov 24 '21 at 07:42

2

Update empty string values with 0.0 values: if you know the possible non-float values then update it.

df.loc[df['score'] == '', 'score'] = 0.0


df['score']=df['score'].astype(float)

answered Nov 24 '21 at 07:42

Ramesh Ponnusamy

1,553
11
22

Kapilfreeman · Answer 9 · 2019-10-03T15:52:11.687

I solved the similar situation with basic technique using pandas. First load the csv or text file using pandas.It's pretty simple

data=pd.read_excel('link to the file')

Then set the index of data to the respected column that needs to be changed. For example, if your data has ID as one attribute or column, then set index to ID.

 data = data.set_index("ID")

Then delete all the rows with "id" as the value instead of number using following command.

  data = data.drop("id", axis=0).

Hope, this will help you.

score 0 · Answer 10 · answered Jul 19 '23 at 12:29

A good option to handle these types of erroneous values in the data is to remove it at the read_csv step by specifying na_values. This will identify strings to recognize as NA/NaN.

By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘None’, ‘n/a’, ‘nan’, ‘null’. So in your case, since it's complaining about the string 'id' in the data. you could do the following:

df = pd.read_csv('file.csv', na_values = ['id'])

This will specify values the columns with 'id' in them as null and resolve the value error when running analysis on the column of interest

ValueError: could not convert string to float: id

10 Answers10

Linked

Related