1

I have a pandas dataframe with a column where each value is a list of elements. A combination of string and nan values (Which is indicating as dtype: float). Here are the first two elements:

1    [nan, JavaScript, nan, nan, nan, nan, nan, nan...

2    [Java, nan, nan, nan, nan, nan, SQL, nan, nan,...

I am trying to remove the nan values from each list within the column, so that only the strings remain within each list.

Does anyone have any idea of how I could drop these float values while retaining the string values in an efficient manner?

  • Does this answer your question? [How to drop rows of Pandas DataFrame whose value in a certain column is NaN](https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-a-certain-column-is-nan) – sushanth Jun 06 '20 at 13:41
  • Drop or replace? You cannot just drop things - imagine dropping column 1, you'd also drop `Java`. – m02ph3u5 Jun 06 '20 at 13:42
  • I don't want to drop the rows specifically, I just want to remove the 'nan' values from each list – Joshua Yosen Jun 08 '20 at 20:26
  • 1
    Hey anky, I was referring to mo2ph3u5's comments about dropping columns. I finally got time this morning to try out your solutions. Thank you so much! While the first solution threw some errors, the second solution worked like a charm :) – Joshua Yosen Jun 11 '20 at 12:52

2 Answers2

0

Consider this;

df['your_column'] = df['your_column'].map(lambda x: [w for w in x if not isinstance(w, numpy.NaN)])

Result;

1    [JavaScript, ...

2    [Java, SQL, ...
Sy Ker
  • 2,047
  • 1
  • 4
  • 20
0

You can try a list comprehension with pd.notnull()

df['cleaned_col_name'] = [[e for e in i if pd.notnull(e)] for i in df['col_name']]

Or create a dataframe from the column and stack() then aggregate back as list

df['cleaned_col_name'] = pd.DataFrame(df['col_name'].tolist()).stack()
                                           .groupby(level=0).agg(list)

Or with explode:

df['col_name'].explode().dropna().groupby(level=0).agg(list)

Replace col_name and cleaned_col_name from the code with the existing column name and desired column name.

anky
  • 74,114
  • 11
  • 41
  • 70