0

In a dataset where each patient had multiple test administrations and a score on each test date, I have to identify the earliest & latest test dates, then subtract the difference of the scores of those dates. I think I've identified the first & last dates through dplyr, creating new columns for those:

SplitDates <- SortedDates %>% 
  group_by(PatientID) %>% 
  mutate(EarliestTestDate = min(AdministrationDate), 
         LatestTestDate = max(AdministrationDate)) %>% 
  arrange(desc(PatientID))

Score column is TotalScore

Now how do I extract the scores from these 2 dates (for each patient) to create new columns of earliest & latest scores? Haven't been able to figure out a mutate with case_when or if_else to create a score based on a record with a certain date.

camille
  • 16,432
  • 18
  • 38
  • 60
  • You can filter with a condition like `AdministrationDate == EarliestTestDate | AdministrationDate == LatestTestDate` or just `AdministrationDate %in% range(AdministrationDate)` – camille Dec 18 '21 at 22:05
  • Also perhaps use mutate to create a 'earliest - latest' column, e.g. `SplitDates <- SortedDates %>% group_by(PatientID) %>% mutate(EarliestTestDate = min(AdministrationDate), LatestTestDate = max(AdministrationDate), earliest_minus_latest = EarliestTestDate - LatestTestDate) %>% arrange(desc(PatientID))` – jared_mamrot Dec 18 '21 at 22:06
  • 1
    @jared_mamrot I think they're trying to get the difference of some score, not the dates themselves. Hard to say without a [reproducible example](https://stackoverflow.com/q/5963269/5325862) – camille Dec 18 '21 at 22:08
  • These answers cover the 1st part of how to get the earliest/latest test dates. The 2nd part of my problem is: I have a numerical TotalScore column. I want to create new columns for EarliestTestScore & LatestTestScore by somehow taking the TotalScore from the cases for each patient's EarliestTestDate and LatestTestDate. – CodeMonkey Dec 18 '21 at 22:13
  • 1
    That's why a sample of data would be helpful, although I'm pretty sure there should be posts already that have you covered – camille Dec 18 '21 at 23:12

1 Answers1

0

Have you tried to use one combine verb, like left_join, for example?

SplitDates <- SortedDates %>% 
    group_by(PatientID) %>% 
    mutate(EarliestTestDate = min(AdministrationDate), 
        LatestTestDate = max(AdministrationDate)) %>% 
    ungroup() %>%
    left_join(SortedDates,
        by = c(“PatientID” = “PatientID”, “AdministrationDate” = “EarliestTestDate”)) %>% # picking the score of EarliestTestDate
    left_join(SortedDates,
        by = c(“PatientID” = “PatientID”, “AdministrationDate” = “LatestTestDate”)) %>% # picking the score of EarliestTestDate
    arrange(desc(PatientID)) # now you can make the mutante task that you need.

I suggest to you see the dplyr cheatsheet.

GregOliveira
  • 151
  • 10