5

I have a df like:

ID       value1   start     end
 1         100     1        2       
 1         200     2        4
 2         50      1        1
 2         30      3        5 

I want to fill in new rows for each value between start and end

The final df should look like this where period is each unique value between start and end

    ID       value1      period     
     1         100     1        
     1         100     2            
     1         200     2 
     1         200     3 
     1         200     4       
     2         50      1        
     2         30      3    
     2         30      4 
     2         30      5     
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
user2963882
  • 625
  • 1
  • 8
  • 19

2 Answers2

1

After using for loop create the list of column with range , this become an unnesting problem

df['New']=[list(range(x,y+1)) for x , y in zip(df.start,df.end)]
Yourdf=unnesting(df,['New'])
Yourdf
   New  ID  value1  start  end
0    1   1     100      1    2
0    2   1     100      1    2
1    2   1     200      2    4
1    3   1     200      2    4
1    4   1     200      2    4
2    1   2      50      1    1
3    3   2      30      3    5
3    4   2      30      3    5
3    5   2      30      3    5
BENY
  • 317,841
  • 20
  • 164
  • 234
1

Or use the below, list comprehension:

>>> df['period']=[','.join(map(str,range(x,y+1))) for x,y in zip(df['start'],df['end'])]
>>> df.set_index(['ID','value1','start','end']).stack().str.split(',', expand=True).stack().unstack(-2).reset_index(-1,drop=True).reset_index()
   ID  value1  start  end period
0   1     100      1    2      1
1   1     100      1    2      2
2   1     200      2    4      2
3   1     200      2    4      3
4   1     200      2    4      4
5   2      30      3    5      3
6   2      30      3    5      4
7   2      30      3    5      5
8   2      50      1    1      1
>>> 

If you want to drop columns:

>>> df['period']=[','.join(map(str,range(x,y+1))) for x,y in zip(df.pop('start'),df.pop('end'))]
>>> df.set_index(['ID','value1']).stack().str.split(',', expand=True).stack().unstack(-2).reset_index(-1,drop=True).reset_index()
   ID  value1 period
0   1     100      1
1   1     100      2
2   1     200      2
3   1     200      3
4   1     200      4
5   2      30      3
6   2      30      4
7   2      30      5
8   2      50      1
>>> 
U13-Forward
  • 69,221
  • 14
  • 89
  • 114