How to implement extract/separate functions (from dplyr and tidyr) to separate a column into multiple columns. based on arbitrary values?

Question

I have a column:

Y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)

I would like to split into multiple columns, based on the positions of the column values. For instance, I would like:

Y1=c(1,2,3,4,5)
Y2=c(6,7,8,9,10)
Y3=c(11,12,13,14,15)
Y4=c(16,17,18,19,20)

Since I am working with a big data time series set, the divisions will be arbitrary depending on the length of one time period.

It doesn't look like `R` syntax. If `Y <- 1:20; split(Y, as.integer(gl(length(Y), 5, length(Y))))` — akrun, Feb 12 '19 at 15:58
with `tidyverse` `tibble(Y) %>% group_by(grp = (row_number()-1) %/% 5 + 1) %>% summarise(Y = list(Y))` — akrun, Feb 12 '19 at 16:01
https://stackoverflow.com/questions/3302356/how-to-split-a-data-frame — M--, Feb 12 '19 at 16:02

score 1 · Answer 1 · answered Feb 12 '19 at 16:19

Not a dplyr solution, but I believe the easiest way would involve using matrices.

foo = function(data, sep.in=5) {
data.matrix = matrix(data,ncol=5)
data.df = as.data.frame(data.matrix)
return(data.df)
}

I have not tested it but this function should create a data.frame who can be merge to a existing one using cbind()

score 1 · Accepted Answer · answered Feb 12 '19 at 16:48

You can use the base split to split this vector into vectors that are each 5 items long. You could also use a variable to store this interval length.

Using rep with each = 5, and creating a sequence programmatically, gets you a sequence of the numbers 1, 2, ... up to the length divided by 5 (in this case, 4), each 5 times consecutively. Then split returns a list of vectors.

It's worth noting that a variety of SO posts will recommend you store similar data in lists such as this, rather than creating multiple variables, so I'm leaving it in list form here.

Y <- 1:20

breaks <- rep(1:(length(Y) / 5), each = 5)
split(Y, breaks)
#> $`1`
#> [1] 1 2 3 4 5
#> 
#> $`2`
#> [1]  6  7  8  9 10
#> 
#> $`3`
#> [1] 11 12 13 14 15
#> 
#> $`4`
#> [1] 16 17 18 19 20

^{Created on 2019-02-12 by the reprex package (v0.2.1)}

Could you please include some code, on how to get the same in multiple variables. I intend to create a moving window, to visualize multiple plots of these divisions. — DGT, Feb 12 '19 at 19:58
If you need help with a wider scope of the problem, you should update the question to include more data or more situations that what you initially described — camille, Feb 12 '19 at 20:01

akrun · Answer 3 · 2019-02-12T20:03:15.157

We can make use of split (writing the commented code as solution) to split the vector into a list of vectors.

lst <- split(Y, as.integer(gl(length(Y), 5, length(Y))))
lst
#$`1`
#[1] 1 2 3 4 5

#$`2`
#[1]  6  7  8  9 10

#$`3`
#[1] 11 12 13 14 15

#$`4`
#[1] 16 17 18 19 20

Here, the gl create a grouping index by specifying the n, k and length parameters where n - an integer giving the number of levels, k - an integer giving the number of replications, and length -an integer giving the length of the result.

In our case, we want to have 'k' as 5.

as.integer(gl(length(Y), 5, length(Y)))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

If we want to have multiple objects in the global environment, use list2env

list2env(setNames(lst, paste0("Y", seq_along(lst))), envir = .GlobalEnv)
Y1
#[1] 1 2 3 4 5
Y2
#[1]  6  7  8  9 10
Y3
#[1] 11 12 13 14 15
Y4
#[1] 16 17 18 19 20

Or as the OP mentioned dplyr/tidyr in the question, we can use those packages as well

library(tidyverse)
tibble(Y) %>%
   group_by(grp = (row_number()-1) %/% 5 + 1) %>% 
   summarise(Y = list(Y)) %>%
   pull(Y)
#[[1]]
#[1] 1 2 3 4 5

#[[2]]
#[1]  6  7  8  9 10

#[[3]]
#[1] 11 12 13 14 15

#[[4]]
#[1] 16 17 18 19 20

data

Y <- 1:20

How to implement extract/separate functions (from dplyr and tidyr) to separate a column into multiple columns. based on arbitrary values?

3 Answers3

data