0

I have noticed a behaviour that I don't quite understand.

I am doing a conversion of a list of dataclass items into a dataframe. When all values are not-None, everything works as expected:

from dataclasses import dataclass
from dataclasses import asdict
from pandas import json_normalize

@dataclass
class TestItem:
    name: str = None
    id: int = None


test_item_list = [
    TestItem(name='teapot', id=11),
    TestItem(name='kettle', id=12),
    TestItem(name='boiler', id=13)
]

df = json_normalize(asdict(item) for item in test_item_list)
print(df)

result would be this. This is working as intended:

     name  id
0  teapot  11
1  kettle  12
2  boiler  13

but if we change test_item_list like this:

test_item_list = [
    TestItem(name='teapot', id=11),
    TestItem(name='kettle', id=12),
    TestItem(name='boiler')
]

the output will have 'id' column as float values instead of int:

     name    id
0  teapot  11.0
1  kettle  12.0
2  boiler   NaN

df.dtypes will also show that id is now float64 column:

name     object
id      float64
dtype: object

How to solve this issue? In the real working example I have several more complicated item classes, can't manually explicitly convert each of them into desired column type.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

1 Answers1

1

You can cast the id column as nullable integer data type.

>>> df['id'] = df['id'].astype('Int64') # note the capital "I"

>>> print(df.dtypes)
name    object
id       Int64
dtype: object

Output:

     name    id
0  teapot    11
1  kettle    12
2  boiler  <NA>
  • Thank you for the input, however as I mentioned above, I don't think it is a good idea to explicitly convert each column, since I have many item classes, and this would add some overcomplications. However, I will try to make a casting, like you suggested, but dynamically according to the dataclass field type. – Recently_Created_User May 09 '22 at 09:32
  • Actually, after thinking a bit, I realized that I don't have any float values in my data and most likely will not have to work with floats in current project, so I've decided to convert simply to convert all `float` columns that might after json_normalize into `Int64` as you suggested. – Recently_Created_User May 09 '22 at 10:02