2

I am reading in a file with:

pd.read_csv("file.csv", dtype={'ID_1':float})

The file looks like:

ID_0, ID_1,ID_2
a,002,c
b,004,d
c,   ,e       
n,003,g

Unfortunately read_csv fails complaining it can't convert ' ' to a float.

What is the right way to read in a csv and convert anything that can't be converted to a float into NaN?

smci
  • 32,567
  • 20
  • 113
  • 146
Simd
  • 19,447
  • 42
  • 136
  • 271
  • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html Take a look at the `converters` argument you can pass. – Ma0 Sep 12 '16 at 09:50
  • Does this answer your question? [Get pandas.read\_csv to read empty values as empty string instead of nan](https://stackoverflow.com/questions/10867028/get-pandas-read-csv-to-read-empty-values-as-empty-string-instead-of-nan) – dank8 Mar 03 '23 at 02:47

2 Answers2

5

If you don't specify the dtype param and pass skipinitialspace=True then it will just work:

In [4]:
t="""ID_0,ID_1,ID_2
a,002,c
b,004,d
c,   ,e
n,003,g"""

pd.read_csv(io.StringIO(t), skipinitialspace=True)
Out[4]:
  ID_0  ID_1 ID_2
0    a   2.0    c
1    b   4.0    d
2    c   NaN    e
3    n   3.0    g

So in your case:

pd.read_csv("file.csv", skipinitialspace=True)

will just work

You can see that the dtypes are as expected:

In [5]:
pd.read_csv(io.StringIO(t), skipinitialspace=True).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
ID_0    4 non-null object
ID_1    3 non-null float64
ID_2    4 non-null object
dtypes: float64(1), object(2)
memory usage: 176.0+ bytes
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • I think it is skipinitialspace not skipinitialwhitespace which you have at the start of the answer. – Simd Sep 12 '16 at 13:14
4

This is my understanding of reading the documentation:

def my_func(x):
    try:
        converted_value = float(x)
    except ValueError:
        converted_value = 'NaN'
    return converted_value

pd.read_csv("file.csv", dtype={'ID_1':float}, converters={'ID_1':my_func})

(As I am at work now and don't have access to pandas I cannot tell you if it works but it looks as it should (said every programmer ever..))

See also these relevant SO questions:

smci
  • 32,567
  • 20
  • 113
  • 146
Ma0
  • 15,057
  • 4
  • 35
  • 65