How to set unique ID for recurring names within column

Question

My data consists of samples, varying in type, on patients over time. This data is over 10197 observations long. A (small) example of my data is:

PatientName <- c("Jones", "Jones", "Jones", "Smith", "Smith", "Nixon", "Nixon", "Nixon")
SampleType <- c("Venous", "Arterial", "Capillary", "Venous", "Venous", "Venous", "Venous", "Capillary")
DayTested <- c("Monday", "Tuesday", "Wednesday", "Monday", "Monday", "Monday", "Monday", "Tuesday")

df <- data.frame(PatientName, SampleType, DayTested)

I now wish to include a unique ID for when there are repeat sample types on the same patient on the same day.

My anticipated output would be:

df$ID <- c(1,1,1,1,2,1,2,1)

This picks up repeat occurrences of "Smith" and "Nixon" who have repeat "Venous" samples taken on a "Monday" designated by the ID = 2. All other ID's would be equal to 1 as they are seperate samples, taken on seperate days.

Is this please possible to do in R?

What happens if you have extra row `Nixon Venous Tuesday` after 7th row? — zx8754, Sep 28 '17 at 07:41

akrun · Accepted Answer · 2017-09-28T12:28:00.017

4

We can use ave

df$ID <- with(df, as.integer(ave(as.character(SampleType),
         PatientName, DayTested, FUN = seq_along)))
df$ID
#[1] 1 1 1 1 2 1 2 1

Or as @lmo suggested

df$ID <- with(df, ave(as.integer(SampleType), PatientName, DayTested, FUN = seq_along))

edited Sep 28 '17 at 12:28

answered Sep 28 '17 at 07:27

akrun

874,273
37
540
662

1

You could do simplify this a bit `with(df, ave(as.integer(SampleType), PatientName, DayTested, FUN = seq_along))`. – lmo Sep 28 '17 at 12:26
1

@lmo Thanks, I thought about that earlier, – akrun Sep 28 '17 at 12:27

score 2 · Answer 2 · answered Sep 28 '17 at 07:53

2

Not 100% what you want but this gives the desired result.

df$ID <- duplicated(df) + 1

answered Sep 28 '17 at 07:53

Andre Elrico

10,956
6
50
69

score 2 · Answer 3 · answered Sep 28 '17 at 08:02

2

akrun's answer is perfect. Just to show a different way with dplyr and the cumsum function

df %>% mutate(id = 1) %>% group_by(PatientName, SampleType, DayTested) %>% 
  mutate(id = cumsum(id)) %>% ungroup()

answered Sep 28 '17 at 08:02

Edwin

3,184
1
23
25

How to set unique ID for recurring names within column

3 Answers3