I've a df
with 950 rows in it. Let's pretend that the columns are timestamp
, quantity
, event
, file
. This is a good approximation of df
. I want to:
- select all rows where
event
isthis_event
andfile
isthis_file
- and drop the rows if the row has the same
timestamp
as a row wherefile
ismy_file
and thequantity
's match.
How do I do that? Really struggling. I don't know how to manage this.
EDIT:
Example data:
timestamp, event, quantity, file
2018-10-17 02:01:00, slept, 7, base
2018-10-17 02:01:00, slept, 7, temp
2018-10-17 02:01:00, slept, 9, base
2018-10-17 02:04:00, studied, 5, temp
2018-10-17 02:04:00, farted, 7, temp
2018-10-17 02:04:00, drank, 1, base
2018-10-17 02:04:00, exercised, 8, base
2018-10-17 02:04:00, slept, 7, base
So for example I will always keep records that pertain from file base
. This is a bias I want to keep as these records cannot be removed. I want to delete any record from any other file
that isn't base
, e.g. here temp
, where the timestamp
and event
is the same as any of those relating to base
(at the same timestamp) but only when the quantity
is the same as an entry from base
(at the same timestamp).
So in this example data I would expect the code to identify the 2nd entry down and remove this because the quantity 7
is the same as one of the two other's of base
.
The code would not delete anything from 02:04:00 because there are no rows with the same timestamp and event
strings (events are all unique).