I am working through a problem from the VAST 2015 challenge (http://vacommunity.org/VAST+Challenge+2015).
I have a large df with data points tracking (fictitious) individuals as they move around a theme park. The individuals each have a unique ID, and the event type is either 'check-in' or 'movement'. There are circa 500-1000 data points for each user ID (ie. multiple check-ins and multiple movements recorded for each ID). The df has the following columns;
Timestamp ID Type X-coordinate Y-Coordinate
0 2014-6-07 08:00:08 1102394 check-in 99 77
...
21 2014-6-07 08:00:15 1102394 movement 99 76 etc
There are 6411 unique IDs in the dataframe. However, from creating two new dataframes;
df_checkin = df.loc[df['type'] == "check-in"]
df_movement = df.loc[df['type'] == "movement"]
and performing ID
value_counts
across these, it shows that whilst 6411
of the IDs record a 'movement', only 6410
record a 'check-in'.
How can I compare these value counts (or go back to the original df), to identify which of the 6411
IDs
present in the dataset recorded movement but not a check-in?
Thanks!
Edited to include more data;
Movement df
Timestamp id type X Y
21 2014-6-07 08:00:27 657863 movement 7 43
27 2014-6-07 08:00:35 657863 movement 8 43
28 2014-6-07 08:00:36 1187304 movement 63 98
32 2014-6-07 08:00:39 1187304 movement 63 97
33 2014-6-07 08:00:42 1187304 movement 64 96
... ... ... ... ... ...
9078618 2014-6-07 23:35:24 694782 movement 63 99
9078619 2014-6-07 23:35:24 800018 movement 63 99
9078620 2014-6-07 23:35:24 9591 movement 63 99
9078621 2014-6-07 23:35:24 963248 movement 63 99
9078622 2014-6-07 23:35:24 972171 movement 63 99
Check-in df
Timestamp id type X Y
0 2014-6-07 08:00:08 1102394 check-in 99 77
1 2014-6-07 08:00:08 1187304 check-in 63 99
2 2014-6-07 08:00:08 1363700 check-in 99 77
3 2014-6-07 08:00:10 1449032 check-in 63 99
4 2014-6-07 08:00:10 279658 check-in 63 99
... ... ... ... ... ...
9044493 2014-6-07 23:04:36 90727 check-in 42 37
9044495 2014-6-07 23:04:37 878118 check-in 42 37
9044496 2014-6-07 23:04:38 2044279 check-in 60 37
9044497 2014-6-07 23:04:39 1933890 check-in 60 37
9044498 2014-6-07 23:04:39 1962745 check-in 60 37
Movement Value Counts
999107 2549
1723967 2528
685884 2523
697057 2518
1928560 2382
...
1523741 343
1306211 326
1187304 309
590879 248
657863 73
Name: id, Length: 6411, dtype: int64
Check-in value counts
999107 43
1349992 41
204640 40
1928560 40
1285841 40
..
953838 1
1458915 1
1748887 1
1680161 1
417205 1
Name: id, Length: 6410, dtype: int64