1

i am working with R and here is my dataframe

Died.At <- c(22,40,72,41)
Writer.At <- c(16, 18, 36, 36)
First.Name <- c("John", "John", "Walt", "Walt")
Second.Name <- c("Doe", "Poe", "Whitman", "Austen")
Sex <- c("MALE", "MALE", "MALE", "MALE")

writers_df <- data.frame(Died.At, Writer.At, First.Name, Second.Name, Sex)

i want to add a new column called id according to the name, so john and walt in this case, i know i can easily do this by

id<-c("1","1","2","2")

but i have a large data set to deal with , also, the name will not appear again afterwards, so there will not be anymore john after walt, can anyone help me with this please

on9jai
  • 41
  • 5

1 Answers1

2

We can try

library(data.table)
setDT(writers_df)[, id:= .GRP, First.Name]

Or a base R option is

writers_df$id <- cumsum(!duplicated(writers_df$First.Name))

Or using dplyr

library(dplyr)
writers_df %>%
     mutate(id = group_indices_(., .dots="First.Name"))
akrun
  • 874,273
  • 37
  • 540
  • 662