2

I have the following dataset:

input_df = pd.DataFrame({'Product':['Computer']*5 + ['Television']*7,
                   'Module':['Display']*5 + ['Power Supply']*7,
                 'TTF':[1,2,3,4,6,1,2,3,4,5,7,8]})

I would like to insert missing rows (index 4 and 11) in order to get the following dataset:

output_df = pd.DataFrame({'Product':['Computer']*6 + ['Television']*8,
                   'Module':['Display']*6 + ['Power Supply']*8,
                 'TTF':[1,2,3,4,5,6,1,2,3,4,5,6,7,8]})

What is the most efficient way to insert those rows (my real dataset is actually pretty big with a lot of different categories).

I have found a related post: How would I insert missing rows into this data set? However in this post, the index range doesn't vary from one product to another (always [1 to 8] unlike in my case where it is [1 to 6] for Computer and then [1 to 8] for Television.

Liky
  • 1,105
  • 2
  • 11
  • 32

1 Answers1

2

Use DataFrame.groupby with DataFrame.reindex per groups:

df = (input_df.set_index('TTF')
              .groupby(['Product','Module'], group_keys=False)
              .apply(lambda x: x.reindex(range(x.index.min(), 
                                               x.index.max() + 1), method='ffill'))
              .reset_index()
             )
print (df)
    TTF     Product        Module
0     1    Computer       Display
1     2    Computer       Display
2     3    Computer       Display
3     4    Computer       Display
4     5    Computer       Display
5     6    Computer       Display
6     1  Television  Power Supply
7     2  Television  Power Supply
8     3  Television  Power Supply
9     4  Television  Power Supply
10    5  Television  Power Supply
11    6  Television  Power Supply
12    7  Television  Power Supply
13    8  Television  Power Supply
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252