3

I am working with a longitudinal dataset with ID's, timestamps, and values. I would like to create a new variable for each individual that saves the value for the last timestamp for that individual in each of that individual's rows. For example, if my data looked like this:

    ID Time Value
    1  1:20  1
    1  2:43  2
    1  1:56  3
    2  1:10  4
    2  1:05  5

I would want to create a new variable Final_value as such:

    ID Time Value Final_value
    1  1:20  1    2
    1  2:43  2    2
    1  1:56  3    2
    2  1:10  4    4
    2  1:05  5    4

I prefer to use R's built-in methods. Perhaps there is a way to do this using the "ave" function?

Jess
  • 33
  • 2

2 Answers2

1

Here is an option with dplyr. Grouped by 'ID', convert the 'Time' to time class with hm from lubridate, get the index of the max (which.max) and subset the 'Value' based on that index

library(dplyr)
library(lubridate)
df1 %>%
   group_by(ID) %>%
   mutate(Final_value = Value[which.max(as.numeric(hm(Time)))])
# A tibble: 5 x 4
# Groups:   ID [2]
#     ID Time  Value Final_value
#  <int> <chr> <int>       <int>
#1     1 1:20      1           2
#2     1 2:43      2           2
#3     1 1:56      3           2
#4     2 1:10      4           4
#5     2 1:05      5           4

Or using base R

df1$Final_value <- with(df1, Value[as.logical(ave(as.numeric(as.POSIXct(Time, 
           format = '%H:%M')),
      ID, FUN = function(x) x== max(x)))][ID])

Or another option is to order the data based on the 'Time' and 'ID', and then get the last value with ave

i1 <- with(df1, order(ID, as.POSIXct(TIme, format = '%H:%M')))
with(df1[i1,], ave(Value, ID, FUN = function(x) x[length(x)]))

data

df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Time = c("1:20", "2:43", 
"1:56", "1:10", "1:05"), Value = 1:5), class = "data.frame", row.names = c(NA, 
-5L))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

And here is the data.table way of thinhs

library( data.table )
#make your data (dt) a data.table
setDT( dt )
#set Time as a  iTime
dt[, Time := as.ITime( Time )]
#get the value of the max Time, by ID
dt[, Final_value :=  , by = .( ID ) ]
#perform an update join, where you join 
#  the maximum Value by ID on dt itself
dt[ dt[dt[, .I[Time == max(Time)], by = ID]$V1], 
    Final_value := i.Value, 
    on = .(ID)]

#    ID     Time Value Final_value
# 1:  1 01:20:00     1           2
# 2:  1 02:43:00     2           2
# 3:  1 01:56:00     3           2
# 4:  2 01:10:00     4           4
# 5:  2 01:05:00     5           4
Wimpel
  • 26,031
  • 1
  • 20
  • 37