4

I have a data frame with many null records:

Col_1    Col_2      Col_3
10         5          2
22         7          7
3         9          5       
4         NaN       NaN
5         NaN       NaN
6         4         NaN
7         6          7
8         10        NaN
12        NaN        1

I want to remove all NaN values in all rows of columns . As you could see, each column has different number of rows. So, I want to get something like this:

Col_1    Col_2      Col_3
10         5          2
22         7          7
3          9          5       
4          4          7
6          6          1
7         10          
8                 
12    

I tried

filtered_df = df.dropna(how='any')

But it removes all records in the dataframe. How may I do that ?

user_01
  • 447
  • 3
  • 9
  • 16
  • But what would you want the remaining elements to be? DataFrames must have the same number of elements. – Ami Tavory May 18 '18 at 19:39
  • See the second part of my answer, but with "up" instead of "left" – cs95 May 18 '18 at 19:39
  • @coldspeed I'm not really sure of your answer in the other question or its relevance to this one. DataFrames are for tabular data. If the data is a ragged matrix, it's probably not the right data structure. – Ami Tavory May 18 '18 at 21:55
  • 1
    @AmiTavory Fair enough, it may not be obvious how to go about things so I've reopened the question and answered it. – cs95 May 18 '18 at 22:00
  • Done! Thank you for reminding @coldspeed – user_01 May 24 '18 at 15:46

4 Answers4

4

You can also use pd.concat on a list of series.

Note that columns Col_2 and Col_3 are unavoidably float due to NaN elements, if you remove dtype=object as an option.

res = pd.concat([df[x].dropna().reset_index(drop=True) for x in df], axis=1)

print(res)

   Col_1  Col_2  Col_3
0     10    5.0    2.0
1     22    7.0    7.0
2      3    9.0    5.0
3      4    4.0    7.0
4      5    6.0    1.0
5      6   10.0    NaN
6      7    NaN    NaN
7      8    NaN    NaN
8     12    NaN    NaN
jpp
  • 159,742
  • 34
  • 281
  • 339
3

Using Divakar's justify function—

df[:] = justify(df.values, invalid_val=np.nan, axis=0, side='up')
df = df.fillna('')

print(df)

   Col_1 Col_2 Col_3
0   10.0     5     2
1   22.0     7     7
2    3.0     9     5
3    4.0     4     7
4    5.0     6     1
5    6.0    10      
6    7.0            
7    8.0            
8   12.0            
cs95
  • 379,657
  • 97
  • 704
  • 746
  • The technical proficiency is impressive. Fundamentally, I less like it, for the reasons I outlined in my answer. – Ami Tavory May 18 '18 at 22:44
3

As you could see, each column has different number of rows.

A DataFrame is a tabular data structure: you can look up an index and a column, and find the value. If the number of rows is different per columns, then the index is meaningless and misleading. A dict might be a better alternative:

{c: df[c].dropna().values for c in df.columns}

or

{c: list(df[c]) for c in df.columns}
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • 1
    I completely 100% agree with every word here. I just think most housekeeping questions like this are spurred by display/representation for which I think questions like these are OK. Otherwise, it's recommended one cajoles the OP to uncover the X in the XY problem here... :)... – cs95 May 18 '18 at 22:58
0

you can try this one also

censos_data.dropna(subset=censos_data.columns,inplace=True)