Pandas: ValueError: cannot convert float NaN to integer

Question

I get ValueError: cannot convert float NaN to integer for following:

df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)

The "x" is a column in the csv file, I cannot spot any float NaN in the file, and I don't understand the error or why I am getting it.
When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.

Update: Using the hints in comments/answers I got my data clean with this:

# x contained NaN
df = df[~df['x'].isnull()]

# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]

# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)

You need to figure out what you want to do with any NaNs, and then do it. — cs95, Nov 16 '17 at 15:20
thanks @jezrael , now df[df['x'].isnull()] did identify a row with "NaN" and I could remove it ! Now with another similar field - this seems to have some other garbage which is not int. Is there generic way to find rows which are not convertable to given datatype, so I can identify and garbage them all? — JaakL, Nov 16 '17 at 15:38
Use `pd.to_numeric` with `errors = coerce` instead of astype int then `fillna` with whatever you want. — Bharath M Shetty, Nov 16 '17 at 15:40
In v0.24, pandas introduces Nullable Integer Types which support Integer columns with NaNs. See [this answer](https://stackoverflow.com/a/55704512/4909087) for more information. — cs95, Apr 16 '19 at 09:49
I came to this post because of getting same error, but in my case when I reset dataframe as df = df.reset_index(drop=True), it is resolved... Just commenting here if someone with mine like issue read this.. — DOT, Dec 14 '21 at 14:41
Does this answer your question? [Get pandas.read\_csv to read empty values as empty string instead of nan](https://stackoverflow.com/questions/10867028/get-pandas-read-csv-to-read-empty-values-as-empty-string-instead-of-nan) — dank8, Mar 03 '23 at 02:48

score 103 · Accepted Answer · edited Jun 18 '20 at 18:50

103

For identifying NaN values use boolean indexing:

print(df[df['x'].isnull()])

Then for removing all non-numeric values use to_numeric with parameter errors='coerce' - to replace non-numeric values to NaNs:

df['x'] = pd.to_numeric(df['x'], errors='coerce')

And for remove all rows with NaNs in column x use dropna:

df = df.dropna(subset=['x'])

Last convert values to ints:

df['x'] = df['x'].astype(int)

edited Jun 18 '20 at 18:50

nellac77

65
9

answered Nov 16 '17 at 15:42

jezrael

822,522
95
1,334
1,252

thanks, this was ok. I updated my question with my lines. Final thing what I do not understand is that why I get False for negative numbers: `'-1'.isnumeric()` ? Not issue for my data which had x and y >=0, but general question still, as I do not see it in official document. – JaakL Nov 16 '17 at 16:03
4

you're probably seeing that because python is iterpreting `'-1'` as a string, which is not a number – Ben Jun 21 '18 at 18:09
I was having a `df.max()` on an empty `df`, thanks for your suggestion – Vzzarr May 17 '23 at 13:20

score 38 · Answer 2 · answered Apr 16 '19 at 09:08

38

ValueError: cannot convert float NaN to integer

From v0.24, you actually can. Pandas introduces Nullable Integer Data Types which allows integers to coexist with NaNs.

Given a series of whole float numbers with missing data,

s = pd.Series([1.0, 2.0, np.nan, 4.0])
s

0    1.0
1    2.0
2    NaN
3    4.0
dtype: float64

s.dtype
# dtype('float64')

You can convert it to a nullable int type (choose from one of Int16, Int32, or Int64) with,

s2 = s.astype('Int32') # note the 'I' is uppercase
s2

0      1
1      2
2    NaN
3      4
dtype: Int32

s2.dtype
# Int32Dtype()

Your column needs to have whole numbers for the cast to happen. Anything else will raise a TypeError:

s = pd.Series([1.1, 2.0, np.nan, 4.0])

s.astype('Int32')
# TypeError: cannot safely cast non-equivalent float64 to int32

answered Apr 16 '19 at 09:08

cs95

379,657
97
704
746

4

I get an error saying `TypeError: object cannot be converted to an IntegerDtype` do you have any idea what this means? – Ken May 06 '21 at 04:35
Thanks for calling out "note the 'I' is uppercase". That was my problem. – sql_knievel Apr 30 '22 at 16:35
@Ken, I solved this using `s.astype(float).astype('Int32')` – BoomBoxBoy Jan 24 '23 at 00:29

score 29 · Answer 3 · edited Jan 20 '21 at 14:08

29

Also, even at the lastest versions of pandas if the column is object type you would have to convert into float first, something like:

df['column_name'].astype(np.float).astype("Int32")

NB: You have to go through numpy float first and then to nullable Int32, for some reason.

The size of the int if it's 32 or 64 depends on your variable, be aware you may loose some precision if your numbers are to big for the format.

edited Jan 20 '21 at 14:08

larslovlie

189
1
2
10

answered Feb 07 '20 at 09:21

Luiz Fernando Lobo

694
6
13

It is better to use df['column_name'].astype('float').astype('Int32') – Keith Jan 06 '23 at 22:32

score 9 · Answer 4 · answered Jul 17 '18 at 14:54

9

I know this has been answered but wanted to provide alternate solution for anyone in the future:

You can use .loc to subset the dataframe by only values that are notnull(), and then subset out the 'x' column only. Take that same vector, and apply(int) to it.

If column x is float:

df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)

answered Jul 17 '18 at 14:54

Matt W.

3,692
2
23
46

the left part does what it should but in the df it stays formated as float. (Python 3.6, Pandas 0.22) – InLaw Aug 16 '18 at 07:35

score -1 · Answer 5 · answered Apr 28 '19 at 04:16

-1

if you have null value then in doing mathematical operation you will get this error to resolve it use df[~df['x'].isnull()]df[['x']].astype(int) if you want your dataset to be unchangeable.

answered Apr 28 '19 at 04:16

SATYAJIT MAITRA

73
1
1

Pandas: ValueError: cannot convert float NaN to integer

5 Answers5

ValueError: cannot convert float NaN to integer

Linked

Related