-1

How to separate a column into many, based on a symbol "|" and any additional spaces around this symbol if any:

 input <- tibble(A = c("Ae1 tt1 | Ae2 tt2", "Be1 | Be2 | Be3")) 

 output <-  tibble(B = c("Ae1 tt1", "Be1") , C = c("Ae2 tt2", "Be2"), D = c(NA, "Be3")) 

I tried :

 input %>% 
    separate(A, c("B","C","D"))
    #separate(A, c("B","C","D"), sep = "|.")
    #mutate(B = str_split(A, "*|")) %>% unnest

What is the syntax with regex ?

Solution from R - separate with specific symbol, vertical bare, | (and tidyr::separate() producing unexpected results) does not provide expected output and produces a warning:

input %>% separate(col=A, into=c("B","C","D"), sep = '\\|')
# A tibble: 2 x 3
  B          C          D     
  <chr>      <chr>      <chr> 
1 "Ae1 tt1 " " Ae2 tt2"  <NA> 
2 "Be1 "     " Be2 "    " Be3"
Warning message:
Expected 3 pieces. Missing pieces filled with `NA` in 1 rows [1].

Using separate from tidyr with different length vectors does not seem related unfortunately.

Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
Felipe
  • 719
  • 8
  • 20

1 Answers1

1

You can use

output <- input %>%
    separate(col=A, into=c("B","C","D"), sep="\\s*\\|\\s*", fill="right")

R test:

> input %>% separate(col=A, into=c("B","C","D"), sep="\\s*\\|\\s*", fill="right")
# A tibble: 2 x 3
  B       C       D    
  <chr>   <chr>   <chr>
1 Ae1 tt1 Ae2 tt2 <NA> 
2 Be1     Be2     Be3 

The \s*\|\s* pattern matches a pipe char with any zero or more whitespace chars on both ends of the pipe.

The fill="right" argument fills with missing values on the right.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563