0

I have a pandas df where I have some HTML video events info, such as type of action on video (play, timeupdate, pause), datetime of event per second and seconds of video watched.

I calculated ranges of consecutive seconds watched for each, so that I get for each video a range as [(211, 215)]; however, in some other events, the user skipped video seconds, so the event has multiple sub-ranges watched, as in [[[142, 147], [132, 141], [148, 154]]].

In my DataFrame, I have a row "start_second" and one "end_second" for all events; so for the previous example:

a = pd.DataFrame([{"end_second" : [147, 142, 154], 
                  "start_second": [142, 132, 148],
              }
             ])

enter image description here1

When an event has more than one sub-range, I need to explode the lists start and end into rows, so that the final result keeps the corresponding start_second and end_second on the same row.

I have tried with pandas df.explode['col'] and other functions , such as the ones suggested in How to unnest (explode) a column in a pandas DataFrame?, but none is working as it should.

Is there any way df.explode can take more than one column at the time keeping ordered the list elements?

Thank you in advance for any help!

enter image description here

Sveva
  • 33
  • 4
  • you can try unnest function ~`unnesting(df, ['end_second','start_second'], axis=1)` – BENY Jul 02 '20 at 18:16
  • Hello, thank you, I tried them, but they return all possible combinations that I would then need to filter out to find the one keeping the correct start and end in the same row. The dataset is large (around 500,000 rows) and getting all these duplicates to work on is terrible in terms of performance. – Sveva Jul 02 '20 at 18:31

0 Answers0