0

I have a df data and I would like to add to a new column a value that exist in a previous column and row if the factor is the same.

Here is a sample:

data <- structure(list(Id = c("a", "b", "b", "b", "a", "a", "b", "b", 
"a", "a"), duration.minutes = c(NA, 139L, 535L, 150L, NA, NA, 
145L, 545L, 144L, NA), event = structure(c(1L, 4L, 3L, 4L, 2L, 
1L, 4L, 3L, 4L, 2L), .Label = c("enter", "exit", "stop", "trip"
), class = "factor")), .Names = c("Id", "duration.minutes", "event"
), class = "data.frame", row.names = 265:274)

and I would like to add a new column called "duration.minutes.past" like this:

data <- structure(list(Id = c("a", "b", "b", "b", "a", "a", "b", "b", 
"a", "a"), duration.minutes = c(NA, 139L, 535L, 150L, NA, NA, 
145L, 545L, 144L, NA), event = structure(c(1L, 4L, 3L, 4L, 2L, 
1L, 4L, 3L, 4L, 2L), .Label = c("enter", "exit", "stop", "trip"
), class = "factor"), duration.minutes.past = c(NA, NA, 139, 
NA, NA, NA, NA, 145, NA, NA)), .Names = c("Id", "duration.minutes", 
"event", "duration.minutes.past"), row.names = 265:274, class = "data.frame")

As you can see, I added in this new column duration.minutes.past the duration.minutes of the previous trip for the same Id. if the Id is different or if is it not a stop, then the value for duration.minutes.past is NA.

Help is much appreciated!

h3rm4n
  • 4,126
  • 15
  • 21
Floni
  • 475
  • 2
  • 13

2 Answers2

2

A possible solution using dplyr,

library(dplyr)

df %>% 
 group_by(Id) %>% 
 mutate(new = replace(lag(duration.minutes), event != 'stop', NA))

#Source: local data frame [10 x 4]
#Groups: Id [2]

#      Id duration.minutes  event   new
#   <chr>            <int> <fctr> <int>
#1      a               NA  enter    NA
#2      b              139   trip    NA
#3      b              535   stop   139
#4      b              150   trip    NA
#5      a               NA   exit    NA
#6      a               NA  enter    NA
#7      b              145   trip    NA
#8      b              545   stop   145
#9      a              144   trip    NA
#10     a               NA   exit    NA
Sotos
  • 51,121
  • 6
  • 32
  • 66
1

We can do this with data.table. Convert the 'data.frame' to 'data.table' (setDT(data)), grouped by 'Id', we create the lag column of 'duration.minutes' using shift), then change the value to 'NA' where the 'event' is not equal to 'stop'

library(data.table)
setDT(data)[, duration.minutes.past := shift(duration.minutes), 
             Id][event != "stop", duration.minutes.past := NA][]
data
#    Id duration.minutes event duration.minutes.past
#1:  a               NA enter                    NA
#2:  b              139  trip                    NA
#3:  b              535  stop                   139
#4:  b              150  trip                    NA
#5:  a               NA  exit                    NA
#6:  a               NA enter                    NA
#7:  b              145  trip                    NA
#8:  b              545  stop                   145
#9:  a              144  trip                    NA
#10: a               NA  exit                    NA

Or this can be done with base R using ave

data$duration.minutes.past <- with(data, NA^(event != "stop") * 
      ave(duration.minutes, Id, FUN = function(x) c(NA, x[-length(x)])))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Also a good solution, but I have more columns than in my sample :). but thanks! – Floni Apr 18 '17 at 08:55
  • @Floni What is the problem with having more columns? I create a new column using `:=`, so it will not have any impact on other columns – akrun Apr 18 '17 at 08:59
  • I actually tried your method but I edited it before to run it... so it did not work – Floni Apr 18 '17 at 09:09