4

I have a dataframe and I'd like to insert a blank row as a separator whenever the value in the first column changes.

For example:

Column 1     Col2    Col3    Col4
A            s       b       d
A            s       j       k
A            b       d       q
B            b       a       d
C            l       k       p

becomes:

Column 1     Col2    Col3    Col4
A            s       b       d
A            s       j       k
A            b       d       q

B            b       a       d

C            l       k       p

because the value in Column 1 changed

The only way that I figured out how to do this is using VBA as indicated by the correctly marked answer here:

How to automatically insert a blank row after a group of data

But I need to do this in Python.

Any help would be really appreciated!

etuo
  • 155
  • 1
  • 11

1 Answers1

11

Create helper DataFrame with index values of last changes, add .5, join together with original by concat, sorting indices by sort_index, create default index by reset_index and lasr remove last row by positions with iloc:

mask = df['Column 1'].ne(df['Column 1'].shift(-1))
df1 = pd.DataFrame('',index=mask.index[mask] + .5, columns=df.columns)

df = pd.concat([df, df1]).sort_index().reset_index(drop=True).iloc[:-1]
print (df)
  Column 1 Col2 Col3 Col4
0        A    s    b    d
1        A    s    j    k
2        A    b    d    q
3                        
4        B    b    a    d
5                        
6        C    l    k    p
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This works beautifully, though one caveat is that if your data frame is in sorted order already, you will want to do a df = df.reset_index(drop=True) before the mask line if you don't already have that in your code. Otherwise the frame will be resorted on indices, rather than whatever criteria you used previously. – Scott Jul 27 '21 at 16:31