0

How to reshape a dataframe where data is arranged in "blocks" having width of x columns and length of Y rows. In this example diagram the starting dataframe is two "blocks" wide and three "blocks" long but the solution should work with other dimensions too. The final "block" may have less rows. This data-structure is obtained from the solution posted here: Split dataframe at specifi row and arrange columns into "sections" in R

enter image description here

I tried to reshape with reshape() into long format but could not figure out how to reorder the "blocks".

Sample data with x=2; y=6 and two "blocks" wide and two "blocks" long:

index v1 index v1
 1    a       7    a
 2    a       8    a
 3    d       9    x
 4    f      10    d
 5    f      11    d
 6    g      12    x
13    e      19    e
14    a      20    e
15    a      21    c
16    d      
17    c      
18    f      

Expected output:

index   v1
1   a
2   a
3   d
4   f
5   f
6   g
7   a
8   a
9   x
10  d
11  d
12  x
13  e
14  a
15  a
16  d
17  c
18  f
19  e
20  e
21  c

1 Answers1

1

R is for wusses. Let's just write C.

reblock <- function (data, x, y) {
  cols <- as.list(data) # ncol items, each length nrow
  reblocked <- as.data.frame(matrix(NA, 0, x))
  rn <- names(data)[seq_len(x)]
  names(reblocked) <- rn
  
  while (nrow(data) >= y) {
    rows <- data[seq_len(y), ]
    while (ncol(rows) >= x) {
      names(rows)[seq_len(x)] <- rn
      reblocked <- rbind(reblocked, rows[seq_len(x)])
      rows <- rows[-seq_len(x)]
    }
    # remove x,y block
    data <- data[-seq_len(y), ]
  } 

  reblocked
}

tmp <- data.frame(
         a = rep(1:4, each = 6), 
         b = rep(letters[1:4], each = 6), 
         c = rep(5:8, each = 6), 
         d = rep(letters[5:8], each = 6)
       )
reblock(tmp, 2 ,6)
dash2
  • 2,024
  • 6
  • 15
  • The code works as intended. However I did not anticipate the final "block" sometimes being incomplete (having less rows) and is currently left out. I added an extra OR condition in the first while loop: "| nrow(data) >= 1" (at least one row is left). This seems to work. It does create and extra empty "block" of NA rows and is not the cleanest solution. If you have a better fix then feel free to modify answer. – user2021713 May 01 '21 at 15:34
  • To remove NA rows from the output: reblocked[rowSums(is.na(reblocked)) != ncol(reblocked), ] – user2021713 May 01 '21 at 16:01