2

I have a df that looks something like the below

Index   Col1   Col2  Col3  Col4   Col5      
 0      12     121   346   abc    747
 1      156    121   146   68     75967
 2      234   121    346   567   
 3      gj    161    646   
 4      214   171   
 5      fhg   

.... .....

And I want to make the dataframe appear such that the columns where there are null values, the columns move/shift their data to the bottom of the dataframe. Eg it should look like:

Index   Col1   Col2  Col3  Col4   Col5      
 0      12     
 1      156    121   
 2      234   121    346   
 3      gj    121    146   abc 
 4      214   161    346   68    747
 5      fhg   171    646   567   75967

I have thought along the lines of shift and/or justify. However not sure how it can be accomplished in the most efficient way for a large dataframe

Malik Asad
  • 441
  • 4
  • 15
asimo
  • 2,340
  • 11
  • 29

2 Answers2

2

You can use a bit changed justify function for working also with non numeric values:

def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = pd.notnull(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val, dtype=object) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

arr = justify(df.values, invalid_val=np.nan, side='down', axis=0)

df = pd.DataFrame(arr, columns=df.columns, index=df.index).astype(df.dtypes)
print (df)
  Col1 Col2 Col3 Col4   Col5
0   12  NaN  NaN  NaN    NaN
1  156  121  NaN  NaN    NaN
2  234  121  346  NaN    NaN
3   gj  121  346  567    NaN
4  214  121  346  567  75967
5  fhg  121  346  567  75967
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thats great, how can i maintain the datatype of the columns. It looks like this is converting to all strings. – asimo Nov 27 '18 at 08:11
  • @asimo - in numpy it is problem :( Only working casting back by `.astype(df.dtypes)` – jezrael Nov 27 '18 at 08:13
  • 1
    no worries, while this is working as a gem, and i can simply use another line(s) at my end to have the datatype I want.. Cheers..i'll tick the answer – asimo Nov 27 '18 at 08:15
1

I tried this,

t=df.isnull().sum()
for val in zip(t.index.values,t.values):
    df[val[0]]=df[val[0]].shift(val[1])
print df

Output:

   Index Col1   Col2   Col3 Col4  Col5      
0      0   12    NaN    NaN  NaN         NaN
1      1  156  121.0    NaN  NaN         NaN
2      2  234  121.0  346.0  NaN         NaN
3      3   gj  121.0  146.0  abc         NaN
4      4  214  161.0  346.0   68       747.0
5      5  fhg  171.0  646.0  567     75967.0

Note: Here I have used loop, may be not a better solution, but it will give you an idea to solve this.

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111