0

I have a dataset with repeated measures: measurements nested within participants (ID) nested in groups. A variable G (with range 0-100) was measured on the group-level. I want to create a new column that shows:

  • The first day on which the maximum value of G was reached in a group coded as zero.
  • How many days each measurement (in this same group) occurred before or after the day on which the maximum was reached. For example: a measurement taken 2 days before the maximum is then coded -2, and a measurement 5 days after the maximum is coded as 5.

Here is an example of what I'm aiming for: Example

I highlighted the days on which the maximum value of G was reached in the different groups. The column 'New' is what I'm trying to get.

I've been trying with dplyr and I managed to get for each group the maximum with group_by, arrange(desc), slice. I then recoded those maxima into zero and joined this dataframe with my original dataframe. However, I cannot manage to do the 'sequence' of days leading up to/ days from the maximum.

EDIT: sorry I didn't include a reprex. I used this code so far:

To find the maximum value: First order by date

data <- data[with(data, order(G, Date)),]

Find maximum and join with original data:

data2 <- data %>% 
dplyr::group_by(Group) %>%
arrange(desc(c(G)), .by_group=TRUE) %>%
slice(1) %>%
ungroup()

data2$New <- data2$G

data2 <- data2 %>%
dplyr::select(c("ID", "New", "Date"))

data3 <- full_join(data, data2, by=c("ID", "Date"))

data3$New[!is.na(data3$New)] <- 0

This gives me the maxima coded as zero and all the other measurements in column New as NA but not yet the number of days leading up to this, and the number of days since. I have no idea how to get to this.

Anne
  • 1
  • 1
  • Hi and welcome to Stack Overflow! Please provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so that other users can help you better – Ric S Jun 22 '20 at 12:10
  • Sorry! I edited my question – Anne Jun 22 '20 at 12:22

1 Answers1

0

It would help if you would be able to provide the data using dput() in your question, as opposed to using an image.

It looked like you wanted to group_by(Group) in your example to compute number of days before and after the maximum date in a Group. However, you have ID of 3 and Group of A that suggests otherwise, and maybe could be clarified.

Here is one approach using tidyverse I hope will be helpful. After grouping and arranging by Date, you can look at the difference in dates comparing to the Date where G is maximum (the first maximum detected in date order).

Also note, as.numeric is included to provide a number, as the result for New is a difftime (e.g., "7 days").

library(tidyverse)

data %>%
  group_by(Group) %>%
  arrange(Date) %>%
  mutate(New = as.numeric(Date - Date[which.max(G)]))
Ben
  • 28,684
  • 5
  • 23
  • 45