3

I have a dataframe df that contains several columns.

The dataframe is already sorted by Contact ID C_ID. A C_ID can appear multiple times in the dataframe. I want to put an "X" in the column MainRecord the first time the C_ID appears so that the result would be as follows:

C_ID  Name  MainRecord
1     JM    X
1     JM  
1     JM  
2     DM    X
3     TY    X
3     TY

I think my solution would need to reference the head function: df[,head(1)]

Danny
  • 554
  • 1
  • 6
  • 17
  • 2
    Here are some other ways (though you'll still have to map 0/1 to blank/X if you still want to do that): https://stackoverflow.com/q/26265732/ – Frank May 02 '18 at 15:56

1 Answers1

3

We can group by 'C_ID', 'Name', and create the 'MainRecord' with case_when

library(dplyr)
df1 %>%
  group_by(C_ID, Name) %>%
  mutate(MainRecord = case_when(row_number()==1 ~ "X", TRUE ~ ""))
# A tibble: 6 x 3
# Groups:   C_ID, Name [3]
#   C_ID Name  MainRecord
#  <int> <chr> <chr>     
#1     1 JM    X         
#2     1 JM    ""        
#3     1 JM    ""        
#4     2 DM    X         
#5     3 TY    X         
#6     3 TY    ""        

Or another option is ifelse

df1 %>%
   group_by(C_ID, Name) %>% 
   mutate(MainRecord = ifelse(row_number()==1, "X", ""))

Or use indexing

df1 %>% 
   group_by(C_ID, Name) %>% 
   mutate(MainRecord = c("", "X")[(row_number()==1) + 1])

Or with data.table, get the row index with .I and assign (:=) the 'X' value that corresponds to the rows

library(data.table)
i1 <- setDT(df1)[, .I[seq_len(.N) == 1], .(C_ID, Name)]$V1
df1[i1, MainRecord := "X"]

Or with base R

i1 <- with(df1, ave(seq_along(C_ID), C_ID, Name, FUN = seq_along)==1)
df1$MainRecord[i1] <- "X"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    A one-line version of the data.table one: `df1[!duplicated(df1, by=c("C_ID", "Name")), MainRecord := "X"]` since duplicated.data.table has a by= arg. – Frank May 02 '18 at 16:41