3

My data consists of samples, varying in type, on patients over time. This data is over 10197 observations long. A (small) example of my data is:

PatientName <- c("Jones", "Jones", "Jones", "Smith", "Smith", "Nixon", "Nixon", "Nixon")
SampleType <- c("Venous", "Arterial", "Capillary", "Venous", "Venous", "Venous", "Venous", "Capillary")
DayTested <- c("Monday", "Tuesday", "Wednesday", "Monday", "Monday", "Monday", "Monday", "Tuesday")

df <- data.frame(PatientName, SampleType, DayTested)

I now wish to include a unique ID for when there are repeat sample types on the same patient on the same day.

My anticipated output would be:

df$ID <- c(1,1,1,1,2,1,2,1)

This picks up repeat occurrences of "Smith" and "Nixon" who have repeat "Venous" samples taken on a "Monday" designated by the ID = 2. All other ID's would be equal to 1 as they are seperate samples, taken on seperate days.

Is this please possible to do in R?

user2716568
  • 1,866
  • 3
  • 23
  • 38

3 Answers3

4

We can use ave

df$ID <- with(df, as.integer(ave(as.character(SampleType),
         PatientName, DayTested, FUN = seq_along)))
df$ID
#[1] 1 1 1 1 2 1 2 1

Or as @lmo suggested

df$ID <- with(df, ave(as.integer(SampleType), PatientName, DayTested, FUN = seq_along))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Not 100% what you want but this gives the desired result.

df$ID <- duplicated(df) + 1
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
2

akrun's answer is perfect. Just to show a different way with dplyr and the cumsum function

df %>% mutate(id = 1) %>% group_by(PatientName, SampleType, DayTested) %>% 
  mutate(id = cumsum(id)) %>% ungroup()
Edwin
  • 3,184
  • 1
  • 23
  • 25