1

As an extension to several questions in this forum similar to this, and since none of the suggestions already provided in this forum I checked could provide me the solution I am looking for, I am posting this question.

I would like to split a list of variable length strings that contain a variable number of a symbol that could be considered as a delimiter into a dataframe as shown in the example below. . Any suggestions?

x = c(NA, "a/b", "a/b/c/d", "a/b/c","a/c","a/b/c/d/e")

df <-  data.frame(x1 = c("a","a","a","a","a"),
                  x2 = c("b","b","b","c","b"),
                  x3 = c(NA,"c","c",NA,"c"),
                  x4 = c(NA,"d",NA,NA,"d"),
                  x5 = c(NA,NA,NA,NA,"e"))
RanonKahn
  • 853
  • 10
  • 34
  • Very close to https://stackoverflow.com/questions/56356632/separate-a-column-of-a-dataframe-in-undefined-number-of-columns-with-r-tidyverse Actually, the answer there works. If you put `x` in data.frame, `df <- data.frame(x, stringsAsFactors = FALSE)` and then do `nmax <- max(stringr::str_count(df$x, "\\/"), na.rm = TRUE) + 1; tidyr::separate(df, x, paste0("col", seq_len(nmax)), sep = "\\/", fill = "right")` – Ronak Shah Jun 02 '19 at 15:40
  • @ronak-shah, That was a great swift practical response solving my problem instantaneously. Can you please provide this solution as an Answer so that I can accept it as a solution to my query? – RanonKahn Jun 02 '19 at 15:46
  • It's the same answer from the link I posted above so this should be closed as dupe of that link. Since, I was not sure about it at first I reopened it and now I cannot close it. Would wait for somebody else to come and close it :) – Ronak Shah Jun 02 '19 at 15:52

2 Answers2

1

I know the answer is already there (+ duplicate) but I wanted to provide a general approach. IMHO current answer does not solve a general case that would involve removing NA and setting column names to x1,x2... as the question suggests.

# example data
x = c(NA, "a/b", "a/b/c/d", "a/b/c","a/c","a/b/c/d/e")
df_q <-  data.frame(x1 = c("a","a","a","a","a"),
                  x2 = c("b","b","b","c","b"),
                  x3 = c(NA,"c","c",NA,"c"),
                  x4 = c(NA,"d",NA,NA,"d"),
                  x5 = c(NA,NA,NA,NA,"e"))

# code
library(magrittr)
df_a = stringr::str_split(x,"/") %>% 
  purrr::keep(~ all(!is.na(.))) %>% 
  purrr::map(function(y){
    res = data.frame(matrix(y,nrow = 1),stringsAsFactors = FALSE)
    colnames(res) = glue::glue("x{1:length(y)}")
    res
  }) %>% 
  dplyr::bind_rows() %>% 
  purrr::map(as.factor) %>% 
  as.data.frame()

# test
identical(df_a,df_q)
#> [1] TRUE

Created on 2019-06-02 by the reprex package (v0.2.1)

David Mas
  • 1,149
  • 2
  • 12
  • 18
0

We can use read.table from base R after pasteing the strings into a single string delimited by \n

read.table(text = paste(x[-1], collapse="\n"), 
   header = FALSE, stringsAsFactors = FALSE, sep="/", fill = TRUE, na.strings = "")
#  V1 V2   V3   V4   V5
#1  a  b <NA> <NA> <NA>
#2  a  b    c    d <NA>
#3  a  b    c <NA> <NA>
#4  a  c <NA> <NA> <NA>
#5  a  b    c    d    e

Or with fread from data.table

library(data.table)
fread(paste(x[-1], collapse="\n"), sep="/", header = FALSE, fill = TRUE)
akrun
  • 874,273
  • 37
  • 540
  • 662