0

I am looking for a solution to split dataframes into smaller parts with specified length and then rearrange those parts into a new dataframe. The goal is to fit more data on a single page when printing narrow tables. For example this:

index   v1
1   a
2   a
3   d
4   f
5   f
6   g
7   a
8   a
9   x
10  d
11  d
12  x
13  e
14  a
15  a
16  d
17  c
18  f
19  e
20  e
21  c

into this:

index   v1  index   v1
1   a   7   a
2   a   8   a
3   d   9   x
4   f   10  d
5   f   11  d
6   g   12  x
13  e   19  e
14  a   20  e
15  a   21  c
16  d       
17  c       
18  f       

In this example we split after every 6th row and arrange into two column sections (if we had a page that could fit 6 rows only). It would be nice to have split points and column section counts as variables. And how to reverse the process to make a regular table again for data analysis after extra data has been entered in Excel for example?

2 Answers2

1

You could sinmpy do

after <- 6
cols <- 2
f <- ((seq(nrow(df)) - 1) %/% after) %% cols
d <- reshape(cbind(df, time = f, id = ave(f, f, FUN=seq)), dir="wide")

d[-1]

   index.0 v1.0 index.1 v1.1
1        1    a       7    a
2        2    a       8    a
3        3    d       9    x
4        4    f      10    d
5        5    f      11    d
6        6    g      12    x
13      13    e      19    e
14      14    a      20    e
15      15    a      21    c
16      16    d      NA <NA>
17      17    c      NA <NA>
18      18    f      NA <NA>

The reverse operation will be:

 reshape(d)[-(1:2)]
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Works as needed. Renaming with repeated original names was easy. Would you be so kind and add the reverse operation as-well? To transform the result back to original. – user2021713 Apr 23 '21 at 21:17
  • @user2021713 Have added the reverse operation – Onyambu Apr 23 '21 at 21:41
  • reshape(d)[-(1:2)] gives me an error: Error in match.arg(direction, c("wide", "long")) : argument "direction" is missing, with no default. Simply adding dir="long" is not enough. – user2021713 Apr 24 '21 at 04:28
  • @user2021713 do nit change d. Let d be exactly as it is. What you probably have to do is to assign d to another variable eg, e then change e as you qant, as long as you do not change d – Onyambu Apr 25 '21 at 05:03
  • well the reshape() function from stats package has a direction argument with no default value. So getting this error makes sense. I have not changed object d. – user2021713 Apr 25 '21 at 07:32
  • @user2021713 yes it has the direction and other parameters, although you can directly reshape an object that was previously reshaped since the direction is stored as an attribute in the object. When i mean change the object is by changing any of its attributes including names. You should not even change the names for example run `d<-reshape...` as line of code as shown in my code above and immediately after that run `reshape(d)` it should transform it – Onyambu Apr 26 '21 at 03:09
  • I see. Thank you for explaining. Unfortunately this method is not useful for me as I need to modify a dataframe that was created without the reshape function directly before. It should work with any dataframe that has this structure. What attributes I need to add to make it work? – user2021713 Apr 26 '21 at 06:29
  • @user2021713 You could add the time variable and the direction = "long" – Onyambu Apr 26 '21 at 21:53
0

Not pretty...

The cbindPad function (to fill incomplete columns) comes from combining two data frames of different lengths

library(tidyverse)

cbindPad <- function(...){
  args <- list(...)
  n <- sapply(args,nrow)
  mx <- max(n)
  pad <- function(x, mx){
    if (nrow(x) < mx){
      nms <- colnames(x)
      padTemp <- matrix(NA, mx - nrow(x), ncol(x))
      colnames(padTemp) <- nms
      if (ncol(x)==0) {
        return(padTemp)
      } else {
        return(rbind(x,padTemp))
      }
    }
    else{
      return(x)
    }
  }
  rs <- lapply(args,pad,mx)
  return(do.call(cbind,rs))
}


df <- tribble(
~index, ~ v1,
1,   "a",
2,   "a",
3,   "d",
4,   "f",
5,   "f",
6,   "g",
7,   "a",
8,   "a",
9,   "x",
10,  "d",
11,  "d",
12,  "x",
13,  "e",
14,  "a",
15,  "a",
16,  "d",
17,  "c",
18,  "f",
19,  "e",
20,  "e",
21,  "c") 

df %>% 
  group_split(ceiling(index / 6), .keep = FALSE) %>% 
  invoke(cbindPad, .)

We get :

index v1 index v1 index v1 index   v1
1     1  a     7  a    13  e    19    e
2     2  a     8  a    14  a    20    e
3     3  d     9  x    15  a    21    c
4     4  f    10  d    16  d    NA <NA>
5     5  f    11  d    17  c    NA <NA>
6     6  g    12  x    18  f    NA <NA>
mdag02
  • 1,035
  • 9
  • 16
  • This result is 4-wide on one "page". How to modify to get 2-wide or other widths? – user2021713 Apr 23 '21 at 20:21
  • 1
    change 6 : use 12. Depending of your objective, manipulating datframe like this is maybe not optimal ; consider looking at knitr::kable or DT::datatable for displaying – mdag02 Apr 23 '21 at 20:32