1

I have troubles with the following problem. I need to keep track of the maximum value of crp and gluc for each ID in the 3 hours before and after each measurement in a series. I tried different solutions over the last few months and was not able to solve this. I have a dataset like the one below:

ID  crp gluc hour
1   5   300  0.3
1   2   NA   0.9
1   NA  89   1.2
1   9   NA   4
1   NA  100  7.1
2   0   NA   0.3
2   NA  50   1
2   NA  70   2.2
2   1   80   5

The result should be:

ID  crp gluc hours  maxCrp  MaxGluc
1   5   300  0.3    5       300
1   2   NA   0.9    5       300
1   NA  89   1.2    9       300
1   9   NA   4      9       89
1   NA  100  7.1    NA      100
2   0   NA   0.3    0       70
2   NA  50   1      0       70
2   NA  70   2.2    1       80
2   1   80   5      1       80

Thank you in advance, Erik

PeterK
  • 1,185
  • 1
  • 9
  • 23
E.K.
  • 13
  • 2
  • 1
    Welcome on board. Please revise your question as there is no 3 hours subset!. Also, please check this [link] (https://stackoverflow.com/questions/24558328/how-to-select-the-row-with-the-maximum-value-in-each-group). – Mohamed Rahouma Dec 06 '20 at 15:01

2 Answers2

0

It's tempting, and generally useful, to use a cool R tool like mutate for this, but since your problem is very specific I prefer using for loops with clear statements that implement each part of your requirement. Here's how I would go:

df1 <- data.frame(ID=c(1,1,1,1,1,2,2,2,2),crp=c(5,2,NA,9,NA,0,NA,NA,1),gluc=c(300,NA,89,NA,100,NA,50,70,80),hour=c(0.3,0.9,1.2,4,7.1,.3,1,2.2,5)) 
df1$maxCrp <- 0
df1$MaxGluc <- 0
clinical_window <- 3
for (i in 1:nrow(df1)){
  mask <- (df1$ID == df1$ID[i]) & (abs(df1$hour-df1$hour[i]) < clinical_window)
  df1$maxCrp[i] <- max(df1$crp[mask], na.rm=TRUE)
  df1$MaxGluc[i] <- max(df1$gluc[mask], na.rm=TRUE)
}
df1$maxCrp[is.infinite(df1$maxCrp)] <- NA
df1$MaxGluc[is.infinite(df1$MaxGluc)] <- NA

This code will throw a warning, as it should, where there is missing data.

PeterK
  • 1,185
  • 1
  • 9
  • 23
0

If interested, you could use data.table and the following approach.

This includes a non-equi self-join, including max values for crp and gluc based on values for hour that fall between +/- 3 hours.

The tmp temporary data.table with the max results are joined back to the original data.

The last line with lapply is from here as one way to remove infinite values from a data table.

library(data.table)

setDT(dt)

tmp <- dt[dt[, .(ID, hour, lower = hour - 3, upper = hour + 3)]
   , on = .(ID = ID, hour >= lower, hour <= upper)
   , .(ID, hour, crp, gluc)
   , allow.cartesian = TRUE
][, .(maxCrp = max(crp, na.rm = T), maxGluc = max(gluc, na.rm = T))
  , by = .(ID, hour)]

dt <- dt[tmp, on = .(ID, hour)]

invisible(lapply(names(dt), 
                 function(.name) set(dt, which(is.infinite(dt[[.name]])), 
                                     j = .name, value = NA)))

dt

Output

   ID crp gluc hour maxCrp maxGluc
1:  1   5  300  0.3      5     300
2:  1   2   NA  0.9      5     300
3:  1  NA   89  1.2      9     300
4:  1   9   NA  4.0      9      89
5:  1  NA  100  7.1     NA     100
6:  2   0   NA  0.3      0      70
7:  2  NA   50  1.0      0      70
8:  2  NA   70  2.2      1      80
9:  2   1   80  5.0      1      80
Ben
  • 28,684
  • 5
  • 23
  • 45