0

I was looking for duplicated IDs (string) in my dataframe, and if it's true, change for a sequence. I tried using the duplicate(x) method but does not seem to find anyone (there are duplicates in my csv). I also saw this web in S.O: Find duplicate values in R and nothing happened apparently.

This is my code so far:

library(stringr)
if (duplicated(census$CS_ID)==TRUE){
  new_CS <- sprintf("CS%d",seq(1:length(census$CS_ID)-1))
  census$CS_ID <- str_replace_all(census$CS_ID, "^.*$", new_CS)
}

I think the main problem is the ==TRUE condition, where maybe it should be if there is any TRUE in the duplicate() clause and not such a whole.

This is a sample of my csv

 CS_ID (str)
 CS620
 CS621
 CS622
 CS624
 CS624     
 CS625
 CS626
 CS627

Thank you for any help! =)

Ignacio Such
  • 129
  • 1
  • 8

1 Answers1

0
library(data.table)
setDT(df)[, CS_ID_new := ifelse(rowid(CS_ID) > 1, paste(CS_ID, rowid(CS_ID)), CS_ID)][]
#    CS_ID CS_ID_new
# 1: CS620     CS620
# 2: CS621     CS621
# 3: CS622     CS622
# 4: CS624     CS624
# 5: CS624   CS624 2
# 6: CS625     CS625
# 7: CS626     CS626
# 8: CS627     CS627
Wimpel
  • 26,031
  • 1
  • 20
  • 37