1

I am currently working on a pandas problem and I would like to know whether there is an easy fix for this problem.

I do have pandas tables that always have a format that looks like this:

df = pd.DataFrame({'A':[1,2,np.nan,np.nan,3],'B':[2,3,np.nan,5,2],'C':[2,3,7,5,9],'D':[1,2,3,np.nan,np.nan]} )

This dataframe should be transformed to:

df = pd.DataFrame({'A':[1,2,7,5,3],'B':[2,3,3,5,2],'C':[2,3,np.nan,np.nan,9],'D':[1,2,np.nan,np.nan,np.nan]} )

This means that all the values in the columns need to be shifted to the left as much as possible. (The first column first needs to be filled, followed by the second one, etc.) Is there an easy solution to do this?

Many thanks in advance.

Beertje
  • 519
  • 2
  • 5
  • 14

1 Answers1

1

Use custom function justify, only convert DataFrame to numpy array:

#https://stackoverflow.com/a/44559180/2901002
df = pd.DataFrame(justify(df.to_numpy(),invalid_val=np.nan), columns=df.columns)
#pandas < 0.24
#df = pd.DataFrame(justify(df.values,invalid_val=np.nan), columns=df.columns)
print (df)
     A    B    C    D
0  1.0  2.0  2.0  1.0
1  2.0  3.0  3.0  2.0
2  7.0  3.0  NaN  NaN
3  5.0  5.0  NaN  NaN
4  3.0  2.0  9.0  NaN

If performance is not important use DataFrame.apply with Series.dropna and Series constructor:

df = df.apply(lambda x: pd.Series(x.dropna().to_numpy()), axis=1)
#pandas < 0.24
#df = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252