I have a large dataset consisting of records of objects travelling within a network. The dataset is stored in a pandas dataset, which roughly looks like this:
Obj ID | Timestamp | x | y | link ID
-------|-----------|---|---|---------
123 | 506 |123|456| ab12
123 | 517 |129|436| ab12
123 | 519 |125|454| cd34
126 | 501 |123|426| gk77
126 | 505 |153|453| gk77
126 | 507 |129|856| bj88
126 | 508 |143|496| bj88
126 | 512 |125|427| gk77
126 | 515 |153|453| gk77
126 | 518 |127|256| ab12
The dataframe has been sorted by Obj ID
, and each block of records belonging to an object has been sorted by time (the Timestamp
field in seconds). Two consecutive rows with the same Obj ID
and link ID
represent the start and end of an object of that link. The object eventually ends at a link, denoted by a single link ID appearing at the end of the object's record.
Here's the desired output. Condense the start-record and the end-record of one object's visit to one link into one record. For the ending-links of each object, just fill in StartTime
for EndTime
.
Obj ID | StartTime | EndTime | link ID
-------|-----------|---------|---------
123 | 506 | 517 | ab12
123 | 519 | 519 | cd34
126 | 501 | 505 | gk77
126 | 507 | 508 | bj88
126 | 512 | 515 | gk77
126 | 518 | 518 | ab12
Note:
Cannot guarantee that an object will not travel onto the same link twice. But each visit to the link needs to be a separate record.
A naive loop solution that I implemented ran out of memory, due to the dataset being quite large.
EDIT: I edited the dataset to include the situation mentioned in the first point of my "notes".