2

I have a dataset with a column with the symbol '|' (come from the interaction of 2 variables in a model), and I want to split it according this character.

The function separate works well with standard character, do you how I can specific the character '|' ?

library(tidyverse)
df <- data.frame(Interaction = c('var1|var2'))

# as expected
df %>% separate(Interaction, c('var1', 'var2'), sep = '1')
#   var1  var2
# 1  var |var2

# not as expected
df %>% separate(Interaction, c('var1', 'var2'), sep = '|')
#   var1 var2
# 1         v
demarsylvain
  • 2,103
  • 2
  • 14
  • 33

2 Answers2

4

We can either escape (\\) the | as it is a metacharacter for regex specifying for OR and the sep by default is in the regex mode

If we look at the ?separate documentation,

separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn", ...)

and it is described as

sep - If character, is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.

df %>% 
  separate(Interaction, c('var1', 'var2'), sep = '\\|')

or place it in square brackets

df %>% 
   separate(Interaction, c('var1', 'var2'), sep = '[|]')
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Vertical bar is a special character, that's why is not performing as expected:

df %>% separate(Interaction, c('var1', 'var2'), sep = '\\|')

That should solve the problem.

Emma
  • 27,428
  • 11
  • 44
  • 69