How to mimic Microsoft Excel's Text to Columns feature in R to split variable number of a delimiter symbol in multiple strings in to multiple columns?

Question

As an extension to several questions in this forum similar to this, and since none of the suggestions already provided in this forum I checked could provide me the solution I am looking for, I am posting this question.

I would like to split a list of variable length strings that contain a variable number of a symbol that could be considered as a delimiter into a dataframe as shown in the example below. . Any suggestions?

x = c(NA, "a/b", "a/b/c/d", "a/b/c","a/c","a/b/c/d/e")

df <-  data.frame(x1 = c("a","a","a","a","a"),
                  x2 = c("b","b","b","c","b"),
                  x3 = c(NA,"c","c",NA,"c"),
                  x4 = c(NA,"d",NA,NA,"d"),
                  x5 = c(NA,NA,NA,NA,"e"))

Very close to https://stackoverflow.com/questions/56356632/separate-a-column-of-a-dataframe-in-undefined-number-of-columns-with-r-tidyverse Actually, the answer there works. If you put `x` in data.frame, `df <- data.frame(x, stringsAsFactors = FALSE)` and then do `nmax <- max(stringr::str_count(df$x, "\\/"), na.rm = TRUE) + 1; tidyr::separate(df, x, paste0("col", seq_len(nmax)), sep = "\\/", fill = "right")` — Ronak Shah, Jun 02 '19 at 15:40
@ronak-shah, That was a great swift practical response solving my problem instantaneously. Can you please provide this solution as an Answer so that I can accept it as a solution to my query? — RanonKahn, Jun 02 '19 at 15:46
It's the same answer from the link I posted above so this should be closed as dupe of that link. Since, I was not sure about it at first I reopened it and now I cannot close it. Would wait for somebody else to come and close it :) — Ronak Shah, Jun 02 '19 at 15:52

score 1 · Answer 1 · answered Jun 02 '19 at 16:58

I know the answer is already there (+ duplicate) but I wanted to provide a general approach. IMHO current answer does not solve a general case that would involve removing NA and setting column names to x1,x2... as the question suggests.

# example data
x = c(NA, "a/b", "a/b/c/d", "a/b/c","a/c","a/b/c/d/e")
df_q <-  data.frame(x1 = c("a","a","a","a","a"),
                  x2 = c("b","b","b","c","b"),
                  x3 = c(NA,"c","c",NA,"c"),
                  x4 = c(NA,"d",NA,NA,"d"),
                  x5 = c(NA,NA,NA,NA,"e"))

# code
library(magrittr)
df_a = stringr::str_split(x,"/") %>% 
  purrr::keep(~ all(!is.na(.))) %>% 
  purrr::map(function(y){
    res = data.frame(matrix(y,nrow = 1),stringsAsFactors = FALSE)
    colnames(res) = glue::glue("x{1:length(y)}")
    res
  }) %>% 
  dplyr::bind_rows() %>% 
  purrr::map(as.factor) %>% 
  as.data.frame()

# test
identical(df_a,df_q)
#> [1] TRUE

^{Created on 2019-06-02 by the reprex package (v0.2.1)}

score 0 · Accepted Answer · answered Jun 02 '19 at 15:57

We can use read.table from base R after pasteing the strings into a single string delimited by \n

read.table(text = paste(x[-1], collapse="\n"), 
   header = FALSE, stringsAsFactors = FALSE, sep="/", fill = TRUE, na.strings = "")
#  V1 V2   V3   V4   V5
#1  a  b <NA> <NA> <NA>
#2  a  b    c    d <NA>
#3  a  b    c <NA> <NA>
#4  a  c <NA> <NA> <NA>
#5  a  b    c    d    e

Or with fread from data.table

library(data.table)
fread(paste(x[-1], collapse="\n"), sep="/", header = FALSE, fill = TRUE)

How to mimic Microsoft Excel's Text to Columns feature in R to split variable number of a delimiter symbol in multiple strings in to multiple columns?

2 Answers2