-1

How can I get the second table (table2) from the first one (table1) without using any loops?

table1 <- data.frame(stringsAsFactors=FALSE,
           x = c("1,2,3"),
           y = c("a,b,c,d"),
           z = c("e,f"))
table1

|x     |y       |z   |
|:-----|:-------|:---|
|1,2,3 |a,b,c,d |e,f |

table2 <- data.frame(stringsAsFactors=FALSE,
           x = c(1, 2, 3, NA),
           y = c("a", "b", "c", "d"),
           z = c("e", "f", NA, NA))
table2

|  x|y  |z  |
|--:|:--|:--|
|  1|a  |e  |
|  2|b  |f  |
|  3|c  |NA |
| NA|d  |NA |

Tabla 1

Tabla 2

Emily Kothe
  • 842
  • 1
  • 6
  • 17
Gonzalo S
  • 23
  • 5
  • https://stackoverflow.com/questions/15201305/how-to-convert-a-list-consisting-of-vector-of-different-lengths-to-a-usable-data – LocoGris Feb 19 '19 at 20:52
  • Screenshots are never a good idea to share data. Firstly because we can't do anything with the data lest we manually type it out. Please provide data in a reproducible and copy&paste-able format using e.g. `dput` (or `dput(head(..., n = 20)` if data is large). Alternatively provide code to generate representative mock-up data. See how to provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more details. – Maurits Evers Feb 19 '19 at 20:57

2 Answers2

0

Here is my attempt at a solution using base R;

x = data.frame(x="1,2,3", y="a,b,c,d", z="e,f", stringsAsFactors = F)

# split each column by the comma
x2 = lapply(x, function(x) strsplit(x, ",")[[1]])

# find the max length of the column
L = max(sapply(x2, length))

# make all the columns equal that length i.e. fill the missing with NA 
x3 = lapply(x2, function(x) { length(x) = L; x })

# cbind them all together and turn into dataframe
x4 = data.frame(do.call(cbind, x3))

It is quite long though. I would be interested to see a better solution.

Adam Waring
  • 1,158
  • 8
  • 20
0

You can use the stringr package to achieve this

table1 <- data.frame(stringsAsFactors=FALSE,
                     x = c("1,2,3"),
                     y = c("a,b,c,d"),
                     z = c("e,f"))

t(stringr::str_split_fixed(table1, pattern = ",", max(stringr::str_count(table1, ","))+1))
#>      [,1] [,2] [,3]
#> [1,] "1"  "a"  "e" 
#> [2,] "2"  "b"  "f" 
#> [3,] "3"  "c"  ""  
#> [4,] ""   "d"  ""

Created on 2019-02-20 by the reprex package (v0.2.0).

To break this down into separate steps

  1. Find the max length of the columns by counting the number of commas in each column, finding the largest number and adding 1 (since the number of items will be 1 more than the number of commas).

max(stringr::str_count(table1, ","))+1

  1. Use str_split_fixed to split each column at the comma, use the maximum number of columns based on the str_count() from the previous step. This will fill extra columns with NA.

stringr::str_split_fixed(table1, pattern = ",", max(stringr::str_count(table1, ","))+1)

  1. Use t() to transpose the table so that it's in the desired format.

t(stringr::str_split_fixed(table1, pattern = ",", max(stringr::str_count(table1, ","))+1))

Emily Kothe
  • 842
  • 1
  • 6
  • 17