Pandas: Remove all NaN values in all columns

Question

I have a data frame with many null records:

Col_1    Col_2      Col_3
10         5          2
22         7          7
3         9          5       
4         NaN       NaN
5         NaN       NaN
6         4         NaN
7         6          7
8         10        NaN
12        NaN        1

I want to remove all NaN values in all rows of columns . As you could see, each column has different number of rows. So, I want to get something like this:

Col_1    Col_2      Col_3
10         5          2
22         7          7
3          9          5       
4          4          7
6          6          1
7         10          
8                 
12

I tried

filtered_df = df.dropna(how='any')

But it removes all records in the dataframe. How may I do that ?

But what would you want the remaining elements to be? DataFrames must have the same number of elements. — Ami Tavory, May 18 '18 at 19:39
See the second part of my answer, but with "up" instead of "left" — cs95, May 18 '18 at 19:39
@coldspeed I'm not really sure of your answer in the other question or its relevance to this one. DataFrames are for tabular data. If the data is a ragged matrix, it's probably not the right data structure. — Ami Tavory, May 18 '18 at 21:55
@AmiTavory Fair enough, it may not be obvious how to go about things so I've reopened the question and answered it. — cs95, May 18 '18 at 22:00

score 4 · Answer 1 · answered May 18 '18 at 22:10

You can also use pd.concat on a list of series.

Note that columns Col_2 and Col_3 are unavoidably float due to NaN elements, if you remove dtype=object as an option.

res = pd.concat([df[x].dropna().reset_index(drop=True) for x in df], axis=1)

print(res)

   Col_1  Col_2  Col_3
0     10    5.0    2.0
1     22    7.0    7.0
2      3    9.0    5.0
3      4    4.0    7.0
4      5    6.0    1.0
5      6   10.0    NaN
6      7    NaN    NaN
7      8    NaN    NaN
8     12    NaN    NaN

score 3 · Answer 2 · answered May 18 '18 at 21:59

3

Using Divakar's justify function—

df[:] = justify(df.values, invalid_val=np.nan, axis=0, side='up')
df = df.fillna('')

print(df)

   Col_1 Col_2 Col_3
0   10.0     5     2
1   22.0     7     7
2    3.0     9     5
3    4.0     4     7
4    5.0     6     1
5    6.0    10      
6    7.0            
7    8.0            
8   12.0

answered May 18 '18 at 21:59

cs95

379,657
97
704
746

The technical proficiency is impressive. Fundamentally, I less like it, for the reasons I outlined in my answer. – Ami Tavory May 18 '18 at 22:44

score 3 · Accepted Answer · answered May 18 '18 at 22:42

3

As you could see, each column has different number of rows.

A DataFrame is a tabular data structure: you can look up an index and a column, and find the value. If the number of rows is different per columns, then the index is meaningless and misleading. A dict might be a better alternative:

{c: df[c].dropna().values for c in df.columns}

or

{c: list(df[c]) for c in df.columns}

answered May 18 '18 at 22:42

Ami Tavory

74,578
11
141
185

1

I completely 100% agree with every word here. I just think most housekeeping questions like this are spurred by display/representation for which I think questions like these are OK. Otherwise, it's recommended one cajoles the OP to uncover the X in the XY problem here... :)... – cs95 May 18 '18 at 22:58

score 0 · Answer 4 · edited Feb 17 '19 at 18:41

0

you can try this one also

censos_data.dropna(subset=censos_data.columns,inplace=True)

edited Feb 17 '19 at 18:41

answered Jun 05 '18 at 15:25

Kp Senapati

21
3

Have a look at https://stackoverflow.com/editing-help#comment-formatting for formatting your answer – Mahendra Yadav Jun 05 '18 at 15:51

Pandas: Remove all NaN values in all columns

4 Answers4

Linked