0

I'm working on a data.table with a column like this:

A <- c("a;b;c","a;a;b","d;a;b","f;f;f")
df <- data.frame(A)

I would like to separate this column into 3 columns like this:

    seg1 seg2 seg3
1    a    b    c
2    a    b <NA>
3    d    a    b
4    f <NA> <NA>

The thing here is that when i split each row by ";" i need to keep unique of the row.

Frank
  • 66,179
  • 8
  • 96
  • 180
Orhan Yazar
  • 909
  • 7
  • 19

2 Answers2

1
library(stringr)

A <- c("a;b;c","a;a;b","d;a;b","f;f;f")
df <- data.frame(A)

df <- str_split_fixed(df$A, ";", 3)
df <- apply(X = df, 
            FUN = function(x){
              return(x[!duplicated(x)][1:ncol(df)])
            }, 
            MARGIN = 1)
df <- t(df)
df <- as.data.frame(df)
names(df) <- c("seg1", "seg2", "seg3")
df

#   seg1  seg2  seg3
# 1   a     b     c
# 2   a     b  <NA>
# 3   d     a     b
# 4   f  <NA>  <NA>
1

Here's a tidyverse approach. We split the character in A, keep only the unique values, paste the result back together and separate into three columns:

library(tidyverse)

df %>%
  mutate(A = map(strsplit(as.character(A), ";"), 
                 .f = ~ paste(unique(.x), collapse = ";"))) %>%
  separate(A, into = c("seg1", "seg2", "seg3"))

Which gives:

#  seg1 seg2 seg3
#1    a    b    c
#2    a    b <NA>
#3    d    a    b
#4    f <NA> <NA>
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77