I have the following df:
id time x y pickup_dropoff
1 2/1/2013 12:23 73 40 pickup
1 2/1/2013 12:25 73 40.2 ping
1 2/1/2013 12.27 73 40.5 ping
1 2/1/2013 12:34 73 41 dropoff
1 2/1/2013 12:35 73 41.4 ping
1 1/1/2013 12:45 73.6 41 pickup
1 1/1/2013 12:57 73.5 41 dropoff
2 1/2/2013 12:54 73.6 42 ping
2 1/2/2013 13:00 73.45 42 pickup
2 1/2/2013 14:00 73 42 dropoff
2 1/2/2013 14:50 73.11 41 pickup
2 1/2/2013 15:30 73 44 dropoff
2 1/2/2013 16:00 73.1 41 pickup
2 1/2/2013 18:00 74 42 dropoff
Thanks to the help I received in this post: Reshape Data partially from Wide to Long in R
I was able reshape the data to resemble the above. I'm looking now to recode some of the factor values to show when a vehicle is in use or is cruising without being in use, This new variable would make the following assumptions:
- if a ping is between a pickup and a dropoff the vehicle is in use
- if a ping is between a dropoff and a pickup its out of use
I'd like the output to look like the following:
id time x y pickup_dropoff status
1 2/1/2013 12:23 73 40 pickup pickup
1 2/1/2013 12:25 73 40.2 ping inuse
1 2/1/2013 12.27 73 40.5 ping inuse
1 2/1/2013 12:34 73 41 dropoff dropoff
1 2/1/2013 12:35 73 41.4 ping nouse
1 1/1/2013 12:45 73.6 41 pickup pickup
1 1/1/2013 12:57 73.5 41 dropoff dropoff
2 1/2/2013 12:54 73.6 42 ping unknown
2 1/2/2013 13:00 73.45 42 pickup pickup
2 1/2/2013 14:00 73 42 dropoff dropoff
2 1/2/2013 14:50 73.11 41 pickup pickup
2 1/2/2013 15:30 73 44 dropoff dropoff
2 1/2/2013 16:00 73.1 41 pickup pickup
2 1/2/2013 18:00 74 42 dropoff dropoff
I currently have pickup_dropoff coded as a factor with 3 levels.
One solution I am playing with is adding a column with the factor levels of 1, 2, 3, then using as.numeric to turn them into numericals and then writing a couple of if statements like the following:
df$status = ifelse(df$pickup_dropoff LAYS BETWEEN 3
and 1, df$pickup_dropoff == "inuse", df$pickup_dropoff)
I may be overthinking this, but I'm not sure if there is a way to say "in between" in R. Also I have to deal with another dimension "id" since I don't want a ping between two different ids to be considered in use. In any case it would be considered "unknown" as the data I am working with is incomplete.
Any help is appreciated. Thanks!