1

I have a dataframe where I have a column having the group no. Under each group there are multiple processes associated which has a specific start and end time.

It looks like the following:

Group   |  Process  |  StartTime          |  EndTime            |
-----------------------------------------------------------------
  1     |     A     | 2023-01-01 10:09:18 | 2023-01-01 11:19:28 |
  1     |     B     | 2023-01-01 11:29:01 | 2023-01-01 19:29:00 |
  1     |     C     | 2023-01-01 19:56:11 | 2023-01-02 01:09:10 |
  2     |     A     | 2023-02-14 23:54:11 | 2023-02-15 04:01:14 |
  2     |     B     | 2023-02-14 05:56:11 | 2023-02-14 09:00:20 |
  2     |     D     | 2023-02-14 10:16:01 | 2023-02-14 21:06:30 |

All I want to do is for each group I want to resample the dataframe with a frequency of 1 minute with the start and end time.

For ex. for Group 1 for Process A, I will have rows starting from 01-01-2023 10:09 to 11:20, sample at a frequency of 1min, which is df.resample('1T')

Group   |  Process      | Sample Timestamp    |  StartTime          |  EndTime           |
    --------------------------------------------------------------------------------------
      1     |     A     | 2023-01-01 10:09:00 | 2023-01-01 10:09:18 | 2023-01-01 11:19:28|
      1     |     A     | 2023-01-01 10:10:00 | 2023-01-01 10:09:18 | 2023-01-01 11:19:28| 
      1     |     A     | 2023-01-01 10:11:00 | 2023-01-01 10:09:18 | 2023-01-01 11:19:28| 
      ....  |    ...    |  ...                |    ...              |   ...              |
      1     |     A     | 2023-01-01 11:18:00 | 2023-01-01 10:09:18 | 2023-01-01 11:19:28|
      1     |     A     | 2023-01-01 11:19:00 | 2023-01-01 10:09:18 | 2023-01-01 11:19:28|
      1     |     B     | 2023-01-01 11:29:00 | 2023-01-01 11:29:01 | 2023-01-01 19:29:00|
      1     |     B     | 2023-01-01 11:30:00 | 2023-01-01 11:29:01 | 2023-01-01 19:29:00|
      ....  |     ...   |    ...              |     ...             |   ...              |
      1     |     B     | 2023-01-01 19:28:00 | 2023-01-01 11:29:01 | 2023-01-01 19:29:00|
      1     |     B     | 2023-01-01 19:29:00 | 2023-01-01 11:29:01 | 2023-01-01 19:29:00|
      < same for Process C and other Groups as well>

As a reference I tried this piece of code over here: Reference Code

But unfortunately, I am unable to implement it by each group.

Any help is appreciated.

Debadri Dutta
  • 1,183
  • 1
  • 13
  • 39

1 Answers1

0

You can use solution without resample:

#convert both columns to datetimes
df['StartTime'] = pd.to_datetime(df['StartTime'])
df['EndTime'] = pd.to_datetime(df['EndTime'])

#repeat indices by difference of columns in minutes
df1 = df.loc[df.index.repeat(df['EndTime'].sub(df['StartTime']).dt.total_seconds() / 60+1)]

#create counter by duplicated indices and convert to minute timedeltas
s = pd.to_timedelta(df1.groupby(level=0).cumcount(), unit='Min')

#insert new column with remove seconds from datetimes
df1.insert(2, 'Sample Timestamp', df['StartTime'].dt.floor('Min') + s)
print (df1)
    Group Process    Sample Timestamp           StartTime             EndTime
0       1       A 2023-01-01 10:09:00 2023-01-01 10:09:18 2023-01-01 11:19:28
0       1       A 2023-01-01 10:10:00 2023-01-01 10:09:18 2023-01-01 11:19:28
0       1       A 2023-01-01 10:11:00 2023-01-01 10:09:18 2023-01-01 11:19:28
0       1       A 2023-01-01 10:12:00 2023-01-01 10:09:18 2023-01-01 11:19:28
0       1       A 2023-01-01 10:13:00 2023-01-01 10:09:18 2023-01-01 11:19:28
..    ...     ...                 ...                 ...                 ...
5       2       D 2023-02-14 21:02:00 2023-02-14 10:16:01 2023-02-14 21:06:30
5       2       D 2023-02-14 21:03:00 2023-02-14 10:16:01 2023-02-14 21:06:30
5       2       D 2023-02-14 21:04:00 2023-02-14 10:16:01 2023-02-14 21:06:30
5       2       D 2023-02-14 21:05:00 2023-02-14 10:16:01 2023-02-14 21:06:30
5       2       D 2023-02-14 21:06:00 2023-02-14 10:16:01 2023-02-14 21:06:30

[1948 rows x 5 columns]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • For some of the groups the timestamp is getting mixed up. not sure why, but timestamp of Group 2 is sometimes being mapped to timestamp of group 3 and so on for some of the groups – Debadri Dutta Feb 27 '23 at 08:49
  • @DebadriDutta - My solution processing each row separately. Are you sure? – jezrael Feb 27 '23 at 08:50
  • @DebadriDutta - One thing - Can you add `df = df.reset_index(drop=True)` before my solution? It should help. – jezrael Feb 27 '23 at 08:51
  • 1
    there were time overlaps for some groups, hence the confusion, the solution is alright, thanks and apologies for the confusion to you as well – Debadri Dutta Mar 01 '23 at 10:06