0

I have a dataframe with multiple columns. Many of the cells have NaN values that I want to drop but only that cell, not the entire row or even the column just that cell. The DataFrame looks something like this

Column1 | Column2 | Column3 | ... |ColumnX
1       | NaN     | NaN     | ....| NaN
2       | NaN     | NaN     | ....| NaN
.
.
.
12       | 1      | NaN     | ....| NaN
13       | 2      | NaN     | ....| NaN
.
.
.
21       | 11     | 1        | ....| NaN
22       | 12     | 2        | ....| NaN

and so on.

The final output should look like

Column1 | Column2 | Column3  | ... |ColumnX
    1       | 1    | 1       | ... | 1
    2       | 2    | 2       | ... | 2
    .
    .
    .
    12       | 12  | 11      | ....| 11
    13       | 13  | 12      | ....| 12
    .
    .
    .
    21       | 21  | 21      | ....| 21
    22       | 22  | 22      | ....| 22

Any idea if this can be achieved?

sanster9292
  • 1,146
  • 2
  • 9
  • 25
  • 1
    Please share the expected output based on your input dataframe also. – Mayank Porwal Nov 14 '21 at 07:40
  • 1
    you cant "drop a cell" ... you could change it to something else... – Joran Beasley Nov 14 '21 at 07:43
  • @JoranBeasley yes that's what I thought and am doing but I was wondering if there was a way that I wasn't aware of – sanster9292 Nov 14 '21 at 07:44
  • Do you mean like [How to move Nan values to end in all columns](https://stackoverflow.com/questions/52621834/how-to-move-nan-values-to-end-in-all-columns)? Then dropna the nan rows? Or like [Remove NaN 'Cells' without dropping the entire ROW (Pandas,Python3)](https://stackoverflow.com/q/25941979/15497888)? – Henry Ecker Nov 14 '21 at 07:45

1 Answers1

1

From your expected output, it does look you want to "count" the NaNs in each column, and substitute them with their occurrence number.

A quick way to achieve this could be the following:

  • you define a function which does the substitutions you need

    import pandas as pd
    import numpy as np
    
    def sub(x):
        mask = [i for i,y in enumerate(x) if np.isnan(y)]
        x[mask] = [x + 1 for x range(len(mask))] # apply ANY transformation you need to x
        return x
    
  • you apply that to each column (let's define a simple dataframe first):

    dt = pd.DataFrame({"col1":[1,2,3], "col2":[np.nan, np.nan, 1], "col3":[1,np.nan,2]})
    
    dt.apply(sub)
    
nikeros
  • 3,302
  • 2
  • 10
  • 26