Removing float values from lists within a Pandas Column

Question

I have a pandas dataframe with a column where each value is a list of elements. A combination of string and nan values (Which is indicating as dtype: float). Here are the first two elements:

1    [nan, JavaScript, nan, nan, nan, nan, nan, nan...

2    [Java, nan, nan, nan, nan, nan, SQL, nan, nan,...

I am trying to remove the nan values from each list within the column, so that only the strings remain within each list.

Does anyone have any idea of how I could drop these float values while retaining the string values in an efficient manner?

Does this answer your question? [How to drop rows of Pandas DataFrame whose value in a certain column is NaN](https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-a-certain-column-is-nan) — sushanth, Jun 06 '20 at 13:41
Drop or replace? You cannot just drop things - imagine dropping column 1, you'd also drop `Java`. — m02ph3u5, Jun 06 '20 at 13:42
I don't want to drop the rows specifically, I just want to remove the 'nan' values from each list — Joshua Yosen, Jun 08 '20 at 20:26
Hey anky, I was referring to mo2ph3u5's comments about dropping columns. I finally got time this morning to try out your solutions. Thank you so much! While the first solution threw some errors, the second solution worked like a charm :) — Joshua Yosen, Jun 11 '20 at 12:52

score 0 · Answer 1 · answered Jun 06 '20 at 14:01

0

Consider this;

df['your_column'] = df['your_column'].map(lambda x: [w for w in x if not isinstance(w, numpy.NaN)])

Result;

1    [JavaScript, ...

2    [Java, SQL, ...

answered Jun 06 '20 at 14:01

Sy Ker

2,047
1
4
20

I recieve: TypeError: isinstance() arg 2 must be a type or tuple of types when attempting this – Joshua Yosen Jun 08 '20 at 20:25

anky · Accepted Answer · 2020-06-06T14:16:31.320

You can try a list comprehension with pd.notnull()

df['cleaned_col_name'] = [[e for e in i if pd.notnull(e)] for i in df['col_name']]

Or create a dataframe from the column and stack() then aggregate back as list

df['cleaned_col_name'] = pd.DataFrame(df['col_name'].tolist()).stack()
                                           .groupby(level=0).agg(list)

Or with explode:

df['col_name'].explode().dropna().groupby(level=0).agg(list)

Replace col_name and cleaned_col_name from the code with the existing column name and desired column name.

Removing float values from lists within a Pandas Column

2 Answers2