Move the data from the top to the bottom of the dataframe (df that has varying length index for its columns)

Question

I have a df that looks something like the below

Index   Col1   Col2  Col3  Col4   Col5      
 0      12     121   346   abc    747
 1      156    121   146   68     75967
 2      234   121    346   567   
 3      gj    161    646   
 4      214   171   
 5      fhg

.... .....

And I want to make the dataframe appear such that the columns where there are null values, the columns move/shift their data to the bottom of the dataframe. Eg it should look like:

Index   Col1   Col2  Col3  Col4   Col5      
 0      12     
 1      156    121   
 2      234   121    346   
 3      gj    121    146   abc 
 4      214   161    346   68    747
 5      fhg   171    646   567   75967

I have thought along the lines of shift and/or justify. However not sure how it can be accomplished in the most efficient way for a large dataframe

@ Mohamed Thasin ah : yes it is working and I have already upvoted your solution — asimo, Nov 28 '18 at 07:13

jezrael · Answer 1 · 2018-11-27T08:12:46.403

You can use a bit changed justify function for working also with non numeric values:

def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = pd.notnull(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val, dtype=object) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

arr = justify(df.values, invalid_val=np.nan, side='down', axis=0)

df = pd.DataFrame(arr, columns=df.columns, index=df.index).astype(df.dtypes)
print (df)
  Col1 Col2 Col3 Col4   Col5
0   12  NaN  NaN  NaN    NaN
1  156  121  NaN  NaN    NaN
2  234  121  346  NaN    NaN
3   gj  121  346  567    NaN
4  214  121  346  567  75967
5  fhg  121  346  567  75967

Thats great, how can i maintain the datatype of the columns. It looks like this is converting to all strings. — asimo, Nov 27 '18 at 08:11
@asimo - in numpy it is problem :( Only working casting back by `.astype(df.dtypes)` — jezrael, Nov 27 '18 at 08:13
no worries, while this is working as a gem, and i can simply use another line(s) at my end to have the datatype I want.. Cheers..i'll tick the answer — asimo, Nov 27 '18 at 08:15

score 1 · Answer 2 · answered Nov 27 '18 at 08:15

I tried this,

t=df.isnull().sum()
for val in zip(t.index.values,t.values):
    df[val[0]]=df[val[0]].shift(val[1])
print df

Output:

   Index Col1   Col2   Col3 Col4  Col5      
0      0   12    NaN    NaN  NaN         NaN
1      1  156  121.0    NaN  NaN         NaN
2      2  234  121.0  346.0  NaN         NaN
3      3   gj  121.0  146.0  abc         NaN
4      4  214  161.0  346.0   68       747.0
5      5  fhg  171.0  646.0  567     75967.0

Note: Here I have used loop, may be not a better solution, but it will give you an idea to solve this.

Move the data from the top to the bottom of the dataframe (df that has varying length index for its columns)

2 Answers2

Linked