0

I am trying to map some data I get from another function to my Pandas dataframe. I create a simple dict for the lookup. For some reason the map function always creates float64 from integers.

So here I create a dictionary for the map function and force the ID to be an int.

all_serials = {phone["serial"]:int(phone["id"]) for phone in all_phones}
df["id"] = df["serial"].map(all_serials)

But still the "id" field is a float64 afterwards.

I have also tried to use astype function:

all_serials = {phone["serial"]:int(phone["id"]) for phone in all_phones}
df["id"] = df["serial"].map(all_serials).astype(int)

But then I get "Cannot convert non-finite values (NA or inf) to integer" for all the lines with no match(NaN as value).

Empusas
  • 372
  • 2
  • 17

1 Answers1

0

As RomanPerekhrest stated, Pandas does not, by default, store NaN values with int values in the same column so it will automatically convert all values to float. See this post about NaN values and ints.

Take this mapping example:

num1 = [1111, 3333, 4444, 7777]
num2 = [1, 3, 4, 7] 
linkage = pd.DataFrame({"num1":num1, "num2":num2})

num1 = [1111, 3333, 5555, 8888]
df = pd.DataFrame({"num1":num1})

mapper = linkage.set_index('num1')['num2']
df['num2'] = df.num1.map(mapper)

Output:

    num1    num2
0   1111    1.0
1   3333    3.0
2   5555    NaN
3   8888    NaN

Because 5555 and 8888 in df are not found in linkage, the map function maps NaN to those rows. As soon as NaN is placed in the column, the rest of the column is converted to float as is the default nature of Pandas. You can convert it back to int by changing the type of the column to Int64 whereas trying to convert it to just type int will get you the error that you saw above.

df.num2 = df.num2.astype('Int64')

Output:

num1    num2
0   1111    1
1   3333    3
2   5555    <NA>
3   8888    <NA>
Michael S.
  • 3,050
  • 4
  • 19
  • 34