0

I have a DataFrame df without specified Dtypes, which is a conditional frequency table where the headers are organized in the following way:

Data Attributes excluding X | freq_v columns for all values v of X

I obtain the frequency columns by performing an outer join, which introduces NaN values into the data frame. So df.fillna(0) worked perfectly until I discretized my original Dataset using data.cut(), where data is also a DataFrame. Now I receive the ValueError.

What I've tried so far:

for header in list(df):
        if 'freq_' in header:
            catcol = pd.Series(df[header], dtype='category')
            catcol.cat.add_categories(0)
            catcol.fillna(0)
            cft[header] = catcol

This is supposed to take the frequency columns out of the DataFrame, convert them to categorical Seiries's so that I am allowed to introduce the new category, and apply fillna() before I overwrite the original column with the series. However, it still throws the exact same error. How do I do this better?

Noc
  • 13
  • 2
  • 4
  • Sample input and output would help understanding of the problem, please provide a [mcve] – G. Anderson May 15 '20 at 14:52
  • Does this answer your question? [Pandas fillna throws ValueError: fill value must be in categories](https://stackoverflow.com/questions/53664948/pandas-fillna-throws-valueerror-fill-value-must-be-in-categories) – Dave May 15 '20 at 14:55
  • you do `add_categories` that is good, but you need to reassign it, otherwise it is "lost", so do `catcol = catcol.cat.add_categories(0)` and same with `fillna` – Ben.T May 15 '20 at 15:01
  • 1
    Thank you Ben! That did it. – Noc May 15 '20 at 15:09
  • @Dave I read it before asking but I did not succeed in solving the problem regardless. – Noc May 15 '20 at 15:10

1 Answers1

0

As explained by Ben.T, cat.add_categories returns a new Series, so I need to change my code in the following way:

for header in list(df):
        if 'freq_' in header:
            catcol = pd.Series(df[header], dtype='category')
            catcol = catcol.cat.add_categories(0)
            catcol.fillna(0)
            cft[header] = catcol
Noc
  • 13
  • 2
  • 4