pandas: for each row in df copy row N times with slight changes

Question

So I have a DataFrame like this:

   N    start
1  1    08/01/2014 9:30:02
2  1    08/01/2014 10:30:02 
3  2    08/01/2014 12:30:02
4  3    08/01/2014 4:30:02

and I need to duplicate each row N times, adding one hour to start each time, like this:

   N    start
1  1    08/01/2014 9:30:02
2  1    08/01/2014 10:30:02 
3  2    08/01/2014 12:30:02
3  2    08/01/2014 13:30:02
4  3    08/01/2014 4:30:02
4  3    08/01/2014 5:30:02
4  3    08/01/2014 6:30:02

how can I do it within pandas?

unutbu · Accepted Answer · 2015-08-16T19:35:15.277

9

You could use reindex to expand the DataFrame, and TimedeltaIndex to add the hours:

import pandas as pd
df = pd.DataFrame({'N': [1, 1, 2, 3],
                   'start': ['08/01/2014 9:30:02',
                             '08/01/2014 10:30:02',
                             '08/01/2014 12:30:02',
                             '08/01/2014 4:30:02']})
df['start'] = pd.to_datetime(df['start'])
df = df.reindex(np.repeat(df.index.values, df['N']), method='ffill')
df['start'] += pd.TimedeltaIndex(df.groupby(level=0).cumcount(), unit='h')

which yields

   N               start
0  1 2014-08-01 09:30:02
1  1 2014-08-01 10:30:02
2  2 2014-08-01 12:30:02
2  2 2014-08-01 13:30:02
3  3 2014-08-01 04:30:02
3  3 2014-08-01 05:30:02
3  3 2014-08-01 06:30:02

edited Aug 16 '15 at 19:35

answered Aug 16 '15 at 19:03

unutbu

842,883
184
1,785
1,677

Thank you! The only thing I cannot understand now : in the second one you used **for i in range(1,4):** - as far as i've got it, I need to replace it with custom numbers, by, lets say, calculating the max N in df, and then checking: if row.N<= maxN, then proceed. Is that right? – Philipp_Kats Aug 16 '15 at 19:20
1

I think I would use the first method. The second method assumed `N` was constant. Since that is not the case, it would be difficult to concatenate copies of `df` properly. – unutbu Aug 16 '15 at 19:39
This method is considerably faster than using the append method in this answer: http://stackoverflow.com/a/24029921/698809 – Jason Bellino Apr 14 '17 at 17:27

score 1 · Answer 2 · answered Aug 16 '15 at 19:20

This may not be the most efficient way but will get you the results:

import pandas as pd
l = []
for index,item in df.iterrows():
    l.append([item[0],pd.to_datetime(item[1])])
    i=1
    # it was not clear if you want to repeat based on N or the index... if index then replace item[0] with index
    while i<item[0]:
        l.append([item[0],pd.to_datetime(item[1])+pd.Timedelta('1 hours')])
        i=i+1
dfResult = pd.DataFrame(l,columns=['N','Start'])

thanks. I did it that way this time. But of course it is a superslow approach — Philipp_Kats, Aug 16 '15 at 19:43

pandas: for each row in df copy row N times with slight changes

2 Answers2