1

Huy Guys,

I have a column in pandas that is a list with all values from row. As a sample below.

print(df4['List']) 

0 [8,9,10,25,14,25,14,17,19,30]
1 [nan,85,48,75,nan,96,32,14,15,21,28,17,nan]
2 [nan,85,48,75,nan,]
3 [1,nan]
4 [85,75,41,nan]
5 [nan,65,34]

How can i do to remove these 'nan' values from my lists?

I tryed some conventional methods of list in python however i don't get make it have the same result in pandas DataFrame.

As this one:

while True:
    try:
        df4['PNs NaNs Removed'] = df4['List'].delete([nan])
    except ValueError:
        break
Caio Euzébio
  • 127
  • 1
  • 1
  • 10
  • 1
    Why are you storing a list in a pandas DataFrame? What are you trying to do? – AMC Nov 01 '19 at 19:34
  • I created a column that contains 1 list with values from line. So i want to remove these 'nan' values from this column and keep valid values. – Caio Euzébio Nov 04 '19 at 11:34
  • I understand your problem, I was curious as to why you were storing lists inside a DataFrame in the first place. The values in your list are actual NaN values, right, not strings like "nan"? – AMC Nov 04 '19 at 21:30

2 Answers2

0

I tried to avoid iterations by using the Series.dropna function:

def no_nan(listy):
    return list(pd.Series(listy).dropna())

df4['List'] = df4['List'].apply(no_nan)
mermaldad
  • 322
  • 2
  • 7
  • Code run without errors, however ```nan``` values remais in list. – Caio Euzébio Nov 01 '19 at 19:53
  • That's odd, it worked for me. What happens if you create a new column, i.e. df4['List2'] = df4['List'].apply(no_nan) ? – mermaldad Nov 01 '19 at 20:01
  • ```KeyError: 'List2'``` – Caio Euzébio Nov 04 '19 at 11:58
  • Hmm. I still don't see what we're doing differently. One more try. What do you get from x = df4['List'].apply(no_nan)? That is what is the value of x after? – mermaldad Nov 04 '19 at 21:57
  • I tested this and I can confirm that it works. Any idea what the differences between this and the list comprehension method are? – AMC Nov 04 '19 at 22:04
  • 1
    I actually just found one myself: The conversion to a Series can mess with the types. The example I tested was a list of integers and `Nan`s, and it cast the integers to floats. So that’s something be aware/careful of. – AMC Nov 04 '19 at 22:13
  • The list comprehension method is an iteration at the python level, whereas the Series.dropna() doesn't require python to iterate. So it should be faster for large datasets. – mermaldad Nov 04 '19 at 22:16
  • 1
    @mermaldad I don’t know why I didn’t just run some benchmarks when I made that comment lol. I also share your intuition that it should be faster, I will benchmark it tomorrow and report back! – AMC Nov 05 '19 at 04:14
0

It turns out that there are many different ways to indicate and handle NaN values, and it can get quite messy. This solution tests values using pandas.isna() which should work for a wider variety of values than numpy.isnan().

import pandas as pd


df4['List'] = df4['List'].apply(lambda col_val: [item for item in col_val if not pd.isna(item)])
AMC
  • 2,642
  • 7
  • 13
  • 35