R how to split elements in a column by delimiter and retain unique elements

Question

I have a dataframe in R with two columns:

  sampleID        annotation
    A1            orange; apple
    A2            apple; apple
    A3            apple; orange; orange; grapes; apple
    A4            grapes; orange

I would like to split the annotation column by the ";" delimiter and retain the ones that are unique and get the output as follows:

  sampleID        annotation
    A1            orange; apple
    A2            apple
    A3            apple; orange; grapes
    A4            grapes; orange

Possible duplicate of https://stackoverflow.com/questions/75494268/is-there-a-way-to-to-eliminate-duplicate-strings-inside-a-column-value-please/75494284#75494284 — akrun, Feb 20 '23 at 17:27

Maël · Accepted Answer · 2023-02-20T12:55:23.150

For each element in data$annotation, split the element, take the unique values, and paste back to a single string (optional) if you want a vector in each element).

base R:

lapply(data$annotation, \(x) paste(unique(strsplit(x, "; ")[[1]]), collapse = "; "))

tidyverse:

library(purrr)
library(dplyr)
library(stringr)
data %>% 
  mutate(annotation = map(annotation, ~ str_flatten(str_unique(str_split_1(.x, "; ")), "; ")))

R how to split elements in a column by delimiter and retain unique elements

1 Answers1