1

I would like to change the default behavior of pandas to fill empty elements with something other that float('nan'), without modifying the source code. I can easily replace NaN after I've created a DataFrame using DataFrame.fillna(), but instead would like to change this behavior for all DataFrames before they are even instantiated.

In my application, I use a library that for some reason crashes if a float('nan') appears in Python, so my idea is to change the pandas default fill so that float('nan') never appears. Is there a way to change the default behaviour?

EDIT:

I tried pd.set_option('mode.use_inf_as_na', True) to no avail

https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html

https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html

likethevegetable
  • 264
  • 1
  • 4
  • 17
  • 3
    [That's been asked here](https://stackoverflow.com/questions/41460685/does-pandas-have-a-default-fill-value-when-constructing-a-dataframe-of-a-specifi) and apparently no one had the answer ;} I'd guess pandas doesn't had a built-in option for that? – rafaelc Oct 03 '19 at 19:31
  • I edited the question with an attempt, but it did not work. – likethevegetable Oct 03 '19 at 19:37
  • how are you building your dataframe? with `pd.DataFrame()` or `pd.read_csv()` or something else? – Mason Caiby Oct 03 '19 at 19:50
  • @MasonCaiby, I am building the datafame with `pd.DataFrame()`, the data comes from a Python list. – likethevegetable Oct 03 '19 at 19:52
  • 1
    `pd.DataFrame()` does not have that functionality, but you can just replace the items in your list with a number, e.g. `my_list = [i if i != 'bad' else 0 for i in a]` – Mason Caiby Oct 03 '19 at 20:04
  • The problem is that as soon as NaN appears, the program crashes. Perhaps it would have been better for me to ask "how to change the `pd.concat()` behavior to avoid NaN. – likethevegetable Oct 03 '19 at 20:10

1 Answers1

1

Missing data in pandas is represented by NaN. The above option, pd.set_option('mode.use_inf_as_na', True) simply tells pandas to recognise inf as a NaN in calculations.

As an example, after setting option isna will now include inf:

In [7]: pd.DataFrame([np.inf, 2, 3, np.inf]).isna()                                                                               
Out[7]: 
       0
0  False
1  False
2  False
3  False

In [8]: pd.set_option('mode.use_inf_as_na', True)                                                                                 

In [9]: pd.DataFrame([np.inf, 2, 3, np.inf]).isna()                                                                               
Out[9]: 
       0
0   True
1  False
2  False
3   True

Currently pandas does not have the above functionality, sorry not really a solution to your problem.

As the comment above says you are better off replacing the NaN values before initialising your pd.DataFrame().

For example, setting None to zero.

list_from_source_code = [None, 2, 3, 4, None, 6, 7]
clean_list_from_source_code = [0 if i is None else i for i in list_from_source_code]

In [4]: pd.DataFrame(clean_list_from_source_code).head(3)                                                                         
Out[4]: 
   0
0  0
1  2
2  3
RK1
  • 2,384
  • 1
  • 19
  • 36
  • Thank you for the explanation. Hopefully they add that capability in the future. Regardless, I should figure out why the API crashes... – likethevegetable Oct 04 '19 at 14:18
  • Yeah that's strange as by definition `type(np.nan) == float` is true, I would take a look at the trace back might actually be something else just triggered by the `nan` occurrence – RK1 Oct 04 '19 at 14:29