How to remove row containing array of NaN?

Question

I have a df like this:

   num1    num2
0  [2.0]   10
1  [3.0]   20
2  [4.0]   30
3  [5.0]   40
4  [6.0]   50
5  [nan]   60 
6  [nan]   70
7  [10.0]  80
8  [nan]   90
9  [15.0]  100

num1 column contains arrays of floats. [nan] is a numpy array containing a single np.NaN.

I am converting this to integers via this:

df['num1'] = list(map(int, df['num1']))

If I just use this df:

   num1    num2
0  [2.0]   10
1  [3.0]   20
2  [4.0]   30
3  [5.0]   40
4  [6.0]   50

This works when there are no [nan] and I get:

   num1   num2
0  2.0  10
1  3.0  20
2  4.0  30
3  5.0  40
4  6.0  50

But if I include the full df with [nan] I get the error:

`ValueError: cannot convert float NaN to integer`

I tried doing:

df[df['num1'] != np.array(np.NaN)]

But this gave the error:

TypeError: len() of unsigned object

How can I get the desired output:

   num1    num2
0  2.0   10
1  3.0   20
2  4.0   30
3  5.0   40
4  6.0   50
5  10.0  80
6  15.0  100

@fabianegli Where in the duplicate does it say how to remove a value in a df which is equal to an array...? — Chuck, Feb 22 '18 at 13:17
what is the data type of emement in columns 1?? list of int or obejct?? — Espoir Murhabazi, Feb 22 '18 at 13:26

O.Suleiman · Answer 1 · 2018-02-22T13:21:33.527

2

This should get rid of all those nan lists, just add the following:

df = df.loc[df['num1'].str[0].dropna().index]

Then you can run the rest of your code as it is.

edited Feb 22 '18 at 13:21

answered Feb 22 '18 at 13:13

O.Suleiman

898
1
6
11

Vivek Kalyanarangan · Accepted Answer · 2018-02-22T13:27:44.480

0

Try this -

df['num1'] = df['num1'].apply(lambda x: x[0]).dropna() # unlist the list of numbers (assuming you dont have multiple)
df['num1'] = list(map(int, df['num1'])) # map operation
print(df)

Output

   num1  num2
0     2    10
1     3    20
2     4    30
3     5    40
4     6    50
7    10    80
9    15   100

Timings (depends on size of data)

# My solution
# 2.6 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# @O.Suleiman's solution
# 2.8 ms ± 457 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# @ Anton vBR's solution
# 2.96 ms ± 504 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

edited Feb 22 '18 at 13:27

answered Feb 22 '18 at 13:10

Vivek Kalyanarangan

8,951
1
23
42

Thanks for the answer. I just updated my output (I had not removed the list following the `map`). So your answer is still valid, just don't need the last line to convert back to list. – Chuck Feb 22 '18 at 13:13
@fabianegli Worked for me ¯\\_(ツ)_/¯ (For me the `drop na` is on a separate line as `df = df.dropna(subset='num1')`. – Chuck Feb 22 '18 at 14:40
@Chuck My issue is the fact that you have to change the code to make it work. The way it is now raises a ValueError. If this solution is edited to contain the actual code that works, I will remove my downvote. – fabianegli Feb 24 '18 at 04:13

Espoir Murhabazi · Answer 3 · 2018-02-22T13:47:30.207

0

df['num1'] = df.num1.str[0]
df.dropna(axis=0, inplace=True)

A solution inspired by suleiman answer but without using loc And here is the output :

num1    num2
0   2.0 10
1   3.0 20
2   4.0 30
3   5.0 40
4   6.0 50
7   10.0    80
9   15.0    100

edited Feb 22 '18 at 13:47

answered Feb 22 '18 at 13:15

Espoir Murhabazi

5,973
5
42
73

the first code doesn't seem to produce the output the OP wants – Vivek Kalyanarangan Feb 22 '18 at 13:21
This code will also remove rows that have np.NaN in any other column than num1. – fabianegli Feb 22 '18 at 14:06
thanks for the feedback , and the review , I didin't think about that , but it solved this particular problem – Espoir Murhabazi Feb 22 '18 at 14:08

score 0 · Answer 4 · answered Feb 22 '18 at 13:15

You can do it as below:

# convert np array containing NaNs into np.NaN
df['num1']=df['num1'].apply(lambda x: np.nan if np.nan in x else x[0])

# use dropna to drop the rows
df=df['num1'].dropna()
print(df)

Output:

   num1    num2
0  2.0   10
1  3.0   20
2  4.0   30
3  5.0   40
4  6.0   50
5  10.0  80
6  15.0  100

score 0 · Answer 5 · answered Feb 22 '18 at 13:16

As you can see there are many options. You can convert to numeric and then remove nulls:

import pandas as pd
import numpy as np

data = dict(num1=[[2.0],[np.nan],['apple']])

df = pd.DataFrame(data)

m = pd.to_numeric(df['num1'].apply(lambda x: x[0]),errors='coerce').dropna().index

df = df.loc[m]

How to remove row containing array of NaN?

5 Answers5