5

I have a "potentially" large DataFrame,

     A    B_1    B_2    B_3    C_1    C_2    C_3
0  231  text2  text3    NaN  date4  date1    NaN
1  443  NaN    NaN    text1  date2    NaN    NaN
2  456  text1  text1  text2  NaN    date3  date1

In order to minimize some of the NaNs I want to shift all the data to the left and thus be able to disregard all NaN columns. This shift though must remain within the appropriate group, meaning that it does not matter if a cell is in column B_1 or B_2, as long as it does not get shifted to C_1 etc.

What I want to end up with is this,

     A    B_1    B_2    B_3    C_1    C_2    
0  231  text2  text3    NaN  date4  date1
1  443  text1    NaN    NaN  date2    NaN
2  456  text1  text1  text2  date3  date1
ALollz
  • 57,915
  • 7
  • 66
  • 89
ealiaj
  • 1,525
  • 1
  • 15
  • 25

1 Answers1

2

Use justify function per groups, only necessary MultiIndex in columns:

df = df.set_index('A')
df.columns = df.columns.str.split('_', expand=True)

f = lambda x: pd.DataFrame(justify(x.values, invalid_val=np.nan), 
                           index=x.index, columns=x.columns)
df = df.groupby(axis=1, level=0).apply(f)
print (df)
         B                    C            
         1      2      3      1      2    3
A                                          
231  text2  text3    NaN  date4  date1  NaN
443  text1    NaN    NaN  date2    NaN  NaN
456  text1  text1  text2  date3  date1  NaN

And then:

df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index()

Combined with solution from previous answer:

g = df.groupby('A').cumcount() + 1
df1 = df.set_index(['A', g]).unstack()

f = lambda x: pd.DataFrame(justify(x.values, invalid_val=np.nan), 
                           index=x.index, columns=x.columns)
df1 = df.groupby(axis=1, level=0).apply(f)

df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • In your combined solution, what effect does `df = df.groupby(axis=1, level=0).apply(f)` have? It does not seem to be used in `df1` – dearn44 Aug 06 '19 at 15:44