-1

How can I remove duplicate characters from the strings of a column using R? For example, This is my column:

df<- data.frame(name = c(A="a,a,b,c,d,d,d",
                            B="a,b,b,b,f",
                            C="d,d,d,d",
                            D="a,a"))

And my expected column:

df<- data.frame(name = c(A="a,b,c,d",
                            B="a,b,f",
                            C="d",
                            D="a"))
  • Does this answer your question? [Remove duplicate values on each string in R](https://stackoverflow.com/questions/56324669/remove-duplicate-values-on-each-string-in-r) – camille Nov 05 '21 at 17:53

3 Answers3

1

An option with map and strsplit

library(tidyverse)
df %>%
   mutate(name = strsplit(as.character(name), ",") %>% 
   map(~toString(unique(.x))))
#        name
#1 a, b, c, d
#2    a, b, f
#3          d
#4          a

Or in base R with regex

sub(",$", "", gsub("([a-z],)\\1+", "\\1", paste0(df$name, ",")))
#[1] "a,b,c,d" "a,b,f"   "d"       "a" 
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Using tidyverse we can first add rownames as column, separate comma separated string into separate_rows, group_by rowname and remove duplicated values and convert them to comma separated string again using toString.

library(tidyverse)

df %>%
  rownames_to_column() %>%
  separate_rows(name, sep = ",") %>%
  group_by(rowname) %>%
  filter(!duplicated(name)) %>%
  summarise(name = toString(name)) %>%
  column_to_rownames()

#        name
#A a, b, c, d
#B    a, b, f
#C          d
#D          a

Base R approach using sapply which is quite same as @tmfmnk

sapply(strsplit(as.character(df$name), ","), function(x) toString(unique(x)))
#[1] "a, b, c, d" "a, b, f"    "d"          "a" 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

One dplyr possibility could be:

df %>%
 rowwise() %>%
 mutate(name = toString(unique(unlist(strsplit(name, ",")))))

  name      
  <chr>     
1 a, b, c, d
2 a, b, f   
3 d         
4 a 

The same with base R:

sapply(df$name, function(x) toString(unique(unlist(strsplit(x, ",")))), USE.NAMES = FALSE)
tmfmnk
  • 38,881
  • 4
  • 47
  • 67