Splitting Text to Two Columns Using a Separator in R

Question

I am trying to split the column using a separator "|" but unfortunately the suggestions in Stack Overflow which recommended using separator() is not working in my situation.

Can someone help me out here.

The data frame, I am using is given below:

structure(list(UniqueID = c("12M-MA | X1", "12M-MA | X2", 
"12M-MA | X3", "12M-MA | X4", "12M-MA | X5"
), Cost = c(0.2, 0.3, 0.2, 0.2, 412.86), Actuals = c(0, 
0, 0, 0, 32), Forecast = c(0, 0, 0, 0, 21), Value_Actuals = c(0, 
0, 0, 0, 28341), Value_Forecast = c(0, 0, 0, 0, 652431
), `Forecast Accuracy` = c(0, 0, 0, 0, 8.51), `Max (Act vs Cons)` = c(0, 
0, 0, 0, 652431)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

The code which I have used and the one which is working is this:

library(readxl)
library(dplyr)
library(writexl)
library(stringr)
Df<- read_excel("C:/X/X/X/X/YXZ-2023.xlsx"skip = 1)

Df <- Df %>% 
  separate(UniqueID,c("ABC","XYZ"),sep = "|")

The expectation is to get keep the column as is and get two more columns one with the name of "ABC" and the other with "XYZ".

12-MA should go into the column "ABC" and the other value such as X1,X2,X3,X4,X5 should go into column "XYZ".

Can someone help me out and let me now what is it that I am doing wrong. This was the suggestion in stack over flow and everywhere but isn't working for me.

Try with escaping \\ the | -> `df %>% separate(UniqueID,c("ABC","XYZ"),sep = "\\|", remove = FALSE)` — TarJae, May 24 '23 at 15:56
@TarJae This worked like a charm. Thank you so much. But can you explain why we use "\\" here when the separator is just a "|" ? — user20203146, May 24 '23 at 22:13
\\ in R escapes special characters. See here ! Here is a list of special characters. — TarJae, May 25 '23 at 11:42
@TarJae Can you please post this as Answer. So that I can mark this as Correct answer and complete the question. And Thank you so much. — user20203146, May 25 '23 at 15:09

freyberg · Accepted Answer · 2023-05-25T19:14:07.080

2

Have you tried the newer separate_wider_delim with cols_remove = F?

require(tidyverse)
# require(tidyr) # if you do not want to import `tidyverse`

Df %>% separate_wider_delim(UniqueID, delim = "|", names = c("ABC", "XYZ"), cols_remove = F)
Df %>% separate_wider_delim(UniqueID, delim = " | ", names = c("ABC", "XYZ"), cols_remove = F) # if you do not want " " in the new columns

# A tibble: 5 × 10
  ABC    XYZ   UniqueID     Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  <chr>  <chr> <chr>       <dbl>   <dbl>    <dbl>         <dbl>          <dbl>               <dbl>               <dbl>
1 12M-MA X1    12M-MA | X1   0.2       0        0             0              0                0                      0
2 12M-MA X2    12M-MA | X2   0.3       0        0             0              0                0                      0
3 12M-MA X3    12M-MA | X3   0.2       0        0             0              0                0                      0
4 12M-MA X4    12M-MA | X4   0.2       0        0             0              0                0                      0
5 12M-MA X5    12M-MA | X5 413.       32       21         28341         652431                8.51              652431

edited May 25 '23 at 19:14

answered May 24 '23 at 15:50

freyberg

367
2
9

Which library has this function? – user20203146 May 24 '23 at 22:15
@user20203146 I edited the answer to include the library. It's `tidyr`. Since you are already using packages from the `tidyverse`, I would recommend simply importing everything with `require(tidyverse)` (after installing it with `install.packages(tidyverse)`. – freyberg May 25 '23 at 19:15
Sorry it doesn't work. But tidyr and tidyverse does not have this function. The function which does this function as per tidyr is **separate()** . When I run this function it gives the result **could not find the function separate_wider_delim**. – user20203146 May 26 '23 at 08:58
1

@user20203146 Then you most likely have an old version of `tidyverse` or `tidyr` installed. It is definitely a `tidyr` function (see https://tidyr.tidyverse.org/reference/separate_wider_delim.html). – freyberg May 27 '23 at 07:48
Alright. Let me update the library and see. – user20203146 May 30 '23 at 07:43
It worked. I have one question. It still keeps space after the text or before the text. How to ensure that text is not present while splitting it? – user20203146 Jul 17 '23 at 14:44

score 1 · Answer 2 · answered May 25 '23 at 16:09

We have to escape | with: \\

\\ in R escapes special characters. See here.

Here is a list of special characters.

To keep the original column we use remove = FALSE argument.

Finally we use across with trimws to remove spaces.

library(dplyr)
library(tidyr)

df %>%    
  separate(UniqueID,c("ABC","XYZ"),sep = "\\|", remove = FALSE) %>% 
  mutate(across(c(ABC, XYZ), trimws))

 UniqueID    ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  <chr>       <chr>  <chr> <dbl>   <dbl>    <dbl>         <dbl>          <dbl>               <dbl>               <dbl>
1 12M-MA | X1 12M-MA X1      0.2       0        0             0              0                0                      0
2 12M-MA | X2 12M-MA X2      0.3       0        0             0              0                0                      0
3 12M-MA | X3 12M-MA X3      0.2       0        0             0              0                0                      0
4 12M-MA | X4 12M-MA X4      0.2       0        0             0              0                0                      0
5 12M-MA | X5 12M-MA X5    413.       32       21         28341         652431                8.51              652431

score 0 · Answer 3 · answered May 25 '23 at 16:19

Another solution is with extract:

library(tidyr)
df %>%
  extract(UniqueID,
          c("ABC", "XYZ"),
          "(.*)\\s\\|\\s(.*)")
# A tibble: 5 × 9
  ABC    XYZ    Cost Actuals Forecast Value_Actuals Value_Forecast `Forecast Accuracy` `Max (Act vs Cons)`
  <chr>  <chr> <dbl>   <dbl>    <dbl>         <dbl>          <dbl>               <dbl>               <dbl>
1 12M-MA X1      0.2       0        0             0              0                0                      0
2 12M-MA X2      0.3       0        0             0              0                0                      0
3 12M-MA X3      0.2       0        0             0              0                0                      0
4 12M-MA X4      0.2       0        0             0              0                0                      0
5 12M-MA X5    413.       32       21         28341         652431                8.51              652431

Splitting Text to Two Columns Using a Separator in R

3 Answers3