Inserting blank line in dataframe when data change

Question

I'm trying to insert a blank line into a dataframe whenever a value changes. However my loop accumulate an extra step each time it runs and I can't figure out why.

import pandas as pd

# The blank row to insert
blank_row = [None, None]

#The first order number
first_order_number = df.iloc[0, 0]

# Loops over the df and if a new order number, insert the row. 
for index, row in df.iterrows():
    if row['Purchase order'] != first_order_number:
        last_df = pd.DataFrame(np.insert(last_df.values, index, blank_row,
                                         axis=0), columns=last_df.columns)
        # Set order variable to the new
        first_order_number = row['Purchase order']

Input data:

Purchase order  Store
795571          4
795571          4
795562          5
795562          5
795562          5 
795586          9
795586          9
795586          9
795588          10
795588          10
795588          10
795588          10

Expected output:

Purchase order  Store
795571          4
795571          4

795562          5
795562          5
795562          5 

795586          9
795586          9
795586          9

795588          10
795588          10
795588          10
795588          10

Output:

Purchase order  Store 
795571          4
795571          4

795562          5
795562          5

795562          5 
795586          9

795586          9
795586          9
795588          10

795588          10
795588          10
795588          10

My best guess is that when the variable first_order_number gets updated it looks at the current index and inserts the value at ['Purchase order'] this step then turns to two next time around. To fix this I changed it to:

first_order_number = df.iloc[index + 1, 0]

This just put the counter out of sync and filled up with None-rows after store 5. How can I fix this?

Maybe there is also a better way to achieve this since I know loops over DFs are slow.

Many thanks for all input

Does this answer your question? [Insert Blank Row In Python Data frame when value in column changes?](https://stackoverflow.com/questions/54105398/insert-blank-row-in-python-data-frame-when-value-in-column-changes) — norie, Jun 27 '21 at 12:26
@norie It looks like a short and sweet solution but I can't get it to work. When I try the solution I end up with a blank line after each row. Maybe I'm missing something, I don't fully understand what's happening in your example — G_olof, Jun 27 '21 at 12:52

score 2 · Accepted Answer · answered Jun 27 '21 at 14:15

Using the solution from https://stackoverflow.com/a/54105450/5660315, first calculate the last row for each purchase order in mask. Then assign intermediate index values for purchase order changes in df1. Finally, add these new index rows to df, resetting the index, and dropping the last empty line.

# From https://stackoverflow.com/a/54105450/5660315
# Insert empty line after change in first column value

from io import StringIO

import pandas as pd

data_text ="""
Purchase order,Store
795571,4
795571,4
795562,5
795562,5
795562,5 
795586,9
795586,9
795586,9
795588,10
795588,10
795588,10
795588,10
"""

df = pd.read_csv(StringIO(data_text))
mask = df['Purchase order'].ne(df['Purchase order'].shift(-1))
df1 = pd.DataFrame('',index=mask.index[mask] + .5, columns=df.columns)

df = pd.concat([df, df1]).sort_index().reset_index(drop=True).iloc[:-1]
print(df.to_string(index=False))
# Purchase order Store
#         795571     4
#         795571     4
                    
#         795562     5
#         795562     5
#         795562     5
                    
#         795586     9
#         795586     9
#         795586     9
                    
#         795588    10
#         795588    10
#         795588    10
#         795588    10

score 1 · Answer 2 · answered Jun 27 '21 at 12:22

Try this:

import pandas as pd
import numpy as np

df = pd.read_clipboard(sep='\s\s+')

blank_row=[np.nan, np.nan]

l = []

for n,g in df.groupby('Purchase order'):
    l.append(g)
    l.append(pd.DataFrame([blank_row], columns=df.columns, index=[0]))

df_out = pd.concat(l,ignore_index=True).iloc[:-1]
df_out

Output:

   Purchase order  Store
0         795562.0    5.0
1         795562.0    5.0
2         795562.0    5.0
3              NaN    NaN
4         795571.0    4.0
5         795571.0    4.0
6              NaN    NaN
7         795586.0    9.0
8         795586.0    9.0
9         795586.0    9.0
10             NaN    NaN
11        795588.0   10.0
12        795588.0   10.0
13        795588.0   10.0
14        795588.0   10.0

Thanks, I'll try this out! Are the "Store" and "Purchase order" converted to floats by the numpy operation? How do I combat this? — G_olof, Jun 27 '21 at 12:33
Unfortunate side effect of pandas for columns with NaN. However, we can use a new Pandas dtype to combat this. — Scott Boston, Jun 27 '21 at 13:16

letdatado · Answer 3 · 2021-06-27T13:31:35.060

0

What is the criteria of grouping certain rows and separating the groups with the blank line? EDIT: Heyy!! USE THIS :

data[2] = data.duplicated(subset=0)
for index in data.loc[data[2] == False].index - 1:
    data.loc[index] = ' '

replace 'data' with your DataFrame's name

Thanks !

edited Jun 27 '21 at 13:31

answered Jun 27 '21 at 12:25

letdatado

93
1
11

It should be grouped by column 'Purchase order'. Whenever that value is not the same as the value above it should insert a blank line – G_olof Jun 27 '21 at 12:30
Alright, I am doing it. Will get it back to you once i am done – letdatado Jun 27 '21 at 12:48
I have edited my previous post with the code. Do try it and give feedbaakc, Thanks – letdatado Jun 27 '21 at 13:35

Inserting blank line in dataframe when data change

3 Answers3