2

I have the following df:

    id    time              x     y      pickup_dropoff
    1    2/1/2013 12:23    73    40       pickup
    1    2/1/2013 12:25    73    40.2     ping
    1    2/1/2013 12.27    73    40.5     ping
    1    2/1/2013 12:34    73    41       dropoff
    1    2/1/2013 12:35    73    41.4     ping
    1    1/1/2013 12:45   73.6   41       pickup
    1    1/1/2013 12:57   73.5   41       dropoff
    2    1/2/2013 12:54   73.6   42       ping   
    2    1/2/2013 13:00   73.45  42       pickup
    2    1/2/2013 14:00   73     42       dropoff
    2    1/2/2013 14:50   73.11  41       pickup
    2    1/2/2013 15:30   73     44       dropoff
    2    1/2/2013 16:00   73.1   41       pickup
    2    1/2/2013 18:00    74    42       dropoff

Thanks to the help I received in this post: Reshape Data partially from Wide to Long in R

I was able reshape the data to resemble the above. I'm looking now to recode some of the factor values to show when a vehicle is in use or is cruising without being in use, This new variable would make the following assumptions:

  1. if a ping is between a pickup and a dropoff the vehicle is in use
  2. if a ping is between a dropoff and a pickup its out of use

I'd like the output to look like the following:

        id    time              x     y      pickup_dropoff     status
         1    2/1/2013 12:23    73    40       pickup           pickup
         1    2/1/2013 12:25    73    40.2     ping              inuse      
         1    2/1/2013 12.27    73    40.5     ping              inuse
         1    2/1/2013 12:34    73    41       dropoff           dropoff
         1    2/1/2013 12:35    73    41.4     ping              nouse
         1    1/1/2013 12:45   73.6   41       pickup            pickup
         1    1/1/2013 12:57   73.5   41       dropoff           dropoff
         2    1/2/2013 12:54   73.6   42       ping              unknown
         2    1/2/2013 13:00   73.45  42       pickup            pickup 
         2    1/2/2013 14:00   73     42       dropoff           dropoff
         2    1/2/2013 14:50   73.11  41       pickup            pickup
         2    1/2/2013 15:30   73     44       dropoff           dropoff
         2    1/2/2013 16:00   73.1   41       pickup            pickup 
         2    1/2/2013 18:00    74    42       dropoff           dropoff 

I currently have pickup_dropoff coded as a factor with 3 levels.

One solution I am playing with is adding a column with the factor levels of 1, 2, 3, then using as.numeric to turn them into numericals and then writing a couple of if statements like the following:

            df$status = ifelse(df$pickup_dropoff LAYS BETWEEN 3
            and 1, df$pickup_dropoff == "inuse", df$pickup_dropoff)

I may be overthinking this, but I'm not sure if there is a way to say "in between" in R. Also I have to deal with another dimension "id" since I don't want a ping between two different ids to be considered in use. In any case it would be considered "unknown" as the data I am working with is incomplete.

Any help is appreciated. Thanks!

Community
  • 1
  • 1
LoF10
  • 1,907
  • 1
  • 23
  • 64
  • 1
    Sounds like you need more levels, not in between. – duffymo Jun 28 '16 at 18:44
  • that's true, I guess adding levels on a condition would be helpful. I'm wondering how to phrase an if statement to add levels , so if x factor value lays between y and x for a given id, code "in use". Not sure what the syntax would look like – LoF10 Jun 28 '16 at 18:48
  • 1
    pickup and dropoff - is that your Russian chauffer's name? Your data doesn't see fine grained enough for this. I'd expect to see a millisecond or nanosecond timestamp, because pickup, dropoff, between, and in use will all be true at discrete moments in time. – duffymo Jun 28 '16 at 18:57

1 Answers1

2

I think this will work

library(dplyr)
df %>% mutate(
    status = ifelse(pickup_dropoff == "pickup", "inuse",
        ifelse(pickup_dropoff == "dropoff", "nouse", NA))
) %>%
group_by(id) %>%
mutate(status = zoo::na.locf(status, na.rm = F),
       status = ifelse(pickup_dropoff %in% c("pickup", "dropoff"), pickup_dropoff, status),
       status = ifelse(is.na(status), "unknown", status))

First will put in the values for pickup and dropoff that we want the new column to take after pickup and dropoff, leaving everything else as NA. Then we fill in the missing values using zoo::na.locf (grouped by ID). Lastly, we reset the values at pickup and dropoff to what we actually want.

This creates a character vector - you can of course stick a factor conversion at the end.


Using plyr or base instead of dplyr:

df$status = with(df, ifelse(pickup_dropoff == "pickup", "inuse",
                ifelse(pickup_dropoff == "dropoff", "nouse", NA))

## pick one
# base
df$status = ave(df$status, df$id, FUN = function(x) zoo::na.locf(x, na.rm = F))
# plyr
df = plyr::ddply(df, "id", plyr::mutate, status = zoo::na.locf(status, na.rm = F))

df$status = with(df, ifelse(pickup_dropoff %in% c("pickup", "dropoff"), pickup_dropoff, status))
df$status = with(df, ifelse(is.na(status), "unknown", status))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • would this be possible in the plyr package? I'm running a higher version of R that won't allow me to load the dplyr package – LoF10 Jun 28 '16 at 19:59
  • You mean a lower version? Sure - I can try to make a quick edit. – Gregor Thomas Jun 28 '16 at 20:36
  • Thanks for the point the right direction! So when I try to run the ave function on df$status I get the following error: Error in `split<-.default`(`*tmp*`, g, value = lapply(split(x, g), FUN)) : replacement has length zero. Any ideas – LoF10 Jun 28 '16 at 22:34
  • I wasn't accounting for IDs in your data that start with a `ping`. I'll make some edits.... should be better now. – Gregor Thomas Jun 28 '16 at 23:24
  • Btw, if you had shared your data with `dput()` so it was copy/pasteable I would have checked my solution. `dput(droplevels(head(df, 14)))` would share the top 14 rows of your data in a very convenient way. – Gregor Thomas Jun 28 '16 at 23:29
  • understood, thanks for the help. I'll make sure moving forward to make the data shareable. I'll test this in a bit see if it goes through. Thanks! – LoF10 Jun 29 '16 at 13:40