The thing is that I currently have a pandas dataframe (which I am going to denote by validations
with the following columns
|-----------------------------------------------------------------|
| line | orientation | route | validationDate | cardNumber | stop |
|-----------------------------------------------------------------|
| 1 | 2 | 2 |1994-01-18,18:00| O219838111 | 2393 |
| 1 | 1 | 1 |1994-01-18,18:03| O211233111 | 2400 |
| ... | ... | ... | ... | ... | ... |
My goal is to find all validations that are connected, that is: look for pairs of entries with the same cardNumber
that have taken place during the same day, regardless of whether it took place on the same line, orientation, bus stop or route.
The thing is that my "grouping" skills are a bit limited so I haven't come up with a better solution than to use one big loop using
itertools.product(validations.iterrows(), validations.iterrows())
But as expected this simply takes too long.
Any ideas?
Thanks in advance!