Reshape table with multiple columns in groups of 3 into table with 3 columns

Question

I have a data frame with multiple columns and multiple rows, which looks like:

                V1       V2        V3         V4      V5       V6
  1             1         2         3         13      14       15
  2             4         5         6         16      NA       NA 
  3             7         8         9         19      20       21 
  4             10        11        12        22      23       24

And I want to reshape it into:

                V1       V2        V3       
  1             1         2         3         
  2             4         5         6         
  3             7         8         9         
  4             10        11        12       
  5             13        14        15
  6             16        NA        NA 
  7             19        20        21 
  8             22        23        24

In the original data.frame, keep every 3 columns as a group, such that (V1, V2, V3) is group1, (V4, V5, V6) is group2, etc. Then move group2 - without the order of values changing - to the end of group1, and move group3 to the end of group2.

I tried:

  as.data.frame(matrix(unlist(mydata, use.names=FALSE), ncol=3, byrow=TRUE))

but have the value order problem.

How can I get my desired data structure?

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2017-12-21T04:45:55.030

You've already noticed that unlist gives you the values column-wise:

unlist(df[1:3], use.names = FALSE)
##  [1]  1  4  7 10  2  5  8 11  3  6  9 12

To get the values row-wise, you can use the c(t(...)) idiom:

c(t(df[1:3]))
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12

That would allow you to solve the problem in base R using:

as.data.frame(matrix(c(t(df[1:3]), t(df[4:6])), ncol = 3, byrow = TRUE))
##   V1 V2 V3
## 1  1  2  3
## 2  4  5  6
## 3  7  8  9
## 4 10 11 12
## 5 13 14 15
## 6 16 NA NA
## 7 19 20 21
## 8 22 23 24

Generalized as a function, you can try something like:

splitter <- function(indf, ncols) {
  if (ncol(indf) %% ncols != 0) stop("Not the right number of columns to split")
  inds <- split(sequence(ncol(indf)), c(0, sequence(ncol(indf)-1) %/% ncols))
  temp <- unlist(lapply(inds, function(x) c(t(indf[x]))), use.names = FALSE)
  as.data.frame(matrix(temp, ncol = ncols, byrow = TRUE))
}
splitter(df, 3)

A more flexible "data.table" approach would be something like the following:

library(data.table)
rbindlist(split.default(as.data.table(df), 
                        c(0, sequence(ncol(df)-1) %/% 3)), 
          use.names = FALSE)
##    V1 V2 V3
## 1:  1  2  3
## 2:  4  5  6
## 3:  7  8  9
## 4: 10 11 12
## 5: 13 14 15
## 6: 16 NA NA
## 7: 19 20 21
## 8: 22 23 24

Thank you very much! I have tried the last two code with my real-world large data frame, it works amazing! — Duerna, Dec 21 '17 at 05:02
@Duerna, Glad to help! `c` and `matrix` and so on should be very fast, and don't require any packages. `data.table` gives you the advantage of working with a large dataset.... — A5C1D2H2I1M1N2O1R2T1, Dec 21 '17 at 05:07

score 2 · Answer 2 · answered Dec 21 '17 at 03:37

2

I was surprised that no one mentions split.default, which also works on data with more columns:

x <- split.default(df, ceiling(seq_along(df) / 3 ))
do.call(rbind, lapply(x, setNames, names(x[[1]])))

#     V1 V2 V3
# 1.1  1  2  3
# 1.2  4  5  6
# 1.3  7  8  9
# 1.4 10 11 12
# 2.1 13 14 15
# 2.2 16 NA NA
# 2.3 19 20 21
# 2.4 22 23 24

Add make.row.names = FALSE to get rid of the odd row names:

do.call(rbind, c(lapply(x, setNames, names(x[[1]])), list(make.row.names = FALSE)))
#   V1 V2 V3
# 1  1  2  3
# 2  4  5  6
# 3  7  8  9
# 4 10 11 12
# 5 13 14 15
# 6 16 NA NA
# 7 19 20 21
# 8 22 23 24

answered Dec 21 '17 at 03:37

mt1022

16,834
5
48
71

1

`split.default` indeed makes sense here. – A5C1D2H2I1M1N2O1R2T1 Dec 21 '17 at 04:03
`split.default` also makes it really easy to use `data.table`.... – A5C1D2H2I1M1N2O1R2T1 Dec 21 '17 at 04:46

sm925 · Answer 3 · 2017-12-21T01:37:11.790

1

You can solve this problem using data.table:-

df <- data.frame(V1 = c(1, 4, 7, 10), V2 = c(2, 5, 8, 11), V3 = c(3, 6, 9, 12), V4 = c(13, 16, 19, 22), V5 = c(14, NA, 20, 23), V6 = c(15, NA, 21, 24))


library(data.table)
setDT(df)
df1 <- df[, c("V4", "V5", "V6")]
setnames(df1, "V4", "V1")
setnames(df1, "V5", "V2")
setnames(df1, "V6", "V3")
df <- df[, c("V1", "V2", "V3")]
df <- rbind(df, df1)

Output will be:-

   V1 V2 V3
1:  1  2  3
2:  4  5  6
3:  7  8  9
4: 10 11 12
5: 13 14 15
6: 16 NA NA
7: 19 20 21
8: 22 23 24

edited Dec 21 '17 at 01:37

answered Dec 21 '17 at 01:31

sm925

2,648
1
16
28

Thank you! It seems I need a long code since I have 225 columns (75 groups X3). – Duerna Dec 21 '17 at 01:43
1

@Duerna, I've edited [my answer](https://stackoverflow.com/a/47917448/1270695) with a more flexible `data.table` approach. – A5C1D2H2I1M1N2O1R2T1 Dec 21 '17 at 04:51
@ A5C1D2H2I1M1N2O1R2T1 Thank you so much! – Duerna Dec 21 '17 at 05:03

www · Answer 4 · 2017-12-21T02:11:06.977

A solution using dplyr and tidyr.

library(dplyr)
library(tidyr)

dt2 <- dt %>%
  gather(Column, Value) %>%
  extract(Column, into = c("Group", "Index"), regex = "([A-Z+])([\\d].*$)",
          convert = TRUE) %>%
  mutate(Index = Index %% 3) %>%
  mutate(Index = ifelse(Index == 0, 3, Index)) %>%
  unite(Column, c("Group", "Index"), sep = "") %>%
  group_by(Column) %>%
  mutate(ID = 1:n()) %>%
  spread(Column, Value) %>%
  select(-ID)
dt2
# # A tibble: 8 x 3
#      V1    V2    V3
# * <int> <int> <int>
# 1     1     2     3
# 2     4     5     6
# 3     7     8     9
# 4    10    11    12
# 5    13    14    15
# 6    16    NA    NA
# 7    19    20    21
# 8    22    23    24

DATA

dt <- read.table(text = "              V1       V2        V3         V4      V5       V6
  1             1         2         3         13      14       15
                 2             4         5         6         16      NA       NA 
                 3             7         8         9         19      20       21 
                 4             10        11        12        22      23       24",
                 header = TRUE)

Update

Here is an example showing the code will also work on larger data frame.

library(dplyr)
library(tidyr)

# Create example data frame
dt <- as_data_frame(matrix(1:60, ncol = 12, byrow = TRUE))

dt2 <- dt %>%
  gather(Column, Value) %>%
  extract(Column, into = c("Group", "Index"), regex = "([A-Z+])([\\d].*$)",
          convert = TRUE) %>%
  mutate(Index = Index %% 3) %>%
  mutate(Index = ifelse(Index == 0, 3, Index)) %>%
  unite(Column, c("Group", "Index"), sep = "") %>%
  group_by(Column) %>%
  mutate(ID = 1:n()) %>%
  spread(Column, Value) %>%
  select(-ID)
dt2
# # A tibble: 20 x 3
#      V1    V2    V3
# * <int> <int> <int>
#  1     1     2     3
#  2    13    14    15
#  3    25    26    27
#  4    37    38    39
#  5    49    50    51
#  6     4     5     6
#  7    16    17    18
#  8    28    29    30
#  9    40    41    42
# 10    52    53    54
# 11     7     8     9
# 12    19    20    21
# 13    31    32    33
# 14    43    44    45
# 15    55    56    57
# 16    10    11    12
# 17    22    23    24
# 18    34    35    36
# 19    46    47    48
# 20    58    59    60

@Duerna Please see my updated post. I change the regex to be `([A-Z+])([\\d].*$)`. The reason is I believe the original code can only handle small data frames (column number smaller than 10), which is because the regex will create one digit index even if the original column names have more than one digits (`V11` to be `V1`, `V12` to be `V1`, ...). The new regex can capture that. I also added a data frame with 12 columns to show the code works. — www, Dec 21 '17 at 02:14
thank you, I have run both code you provided with my original data, and all the two codes could get a result showed "A tibble: 44,400 x 7", it seems all code could handle large data frames. As for why the "dt2" show like " x 7" , not " x 3", I think the reason is my original data header, all the columns header contain same letter : " factor ,x, y" , like " factor1,x1,y1,factor2,x2,y2,factor3,x3,y3,..." . Anyway, thank you very much for you help. — Duerna, Dec 21 '17 at 02:49
@Duerna `([A-Z+])` is to capture upper case only, so it will not work on your real-world data frame. I did not know your header is more complex than your example data frame. I think for your real-world data, we can use `([A-Za-z].*)` to capture any letters. — www, Dec 21 '17 at 03:05
Thank you. I will pay attention to describe my data frame. And I tried with ([A-Za-z].*), I got " A tibble: 2,368 x 73", the code captured 73 out of 75 from my real-world data header. The 1st and 2nd code you kindly provided works much better with my real-word data. — Duerna, Dec 21 '17 at 03:27

score 0 · Answer 5 · answered Dec 21 '17 at 01:51

Here's a general solution for any number of columns, using dplyr.

Test data data:

# A tibble: 5 x 9
     V1    V2    V3    V4    V5    V6    V7    V8    V9
  <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1     2     3     4     5     6     7     8     9
2    10    11    12    13    14    15    16    17    18
3    19    20    21    22    23    24    25    26    27
4    28    29    30    31    32    33    34    35    36
5    37    38    39    40    41    42    43    44    45

The code:

for (i in seq(1, ncol(data), by = 3)) {
  if (i == 1) {
    out <- select(data, 1:3)
  } else {
    out <- select(data, i:(i+2)) %>% setNames(names(out)) %>% bind_rows(out, .)
  }
}

The output out:

# A tibble: 15 x 3
      V1    V2    V3
   <int> <int> <int>
 1     1     2     3
 2    10    11    12
 3    19    20    21
 4    28    29    30
 5    37    38    39
 6     4     5     6
 7    13    14    15
 8    22    23    24
 9    31    32    33
10    40    41    42
11     7     8     9
12    16    17    18
13    25    26    27
14    34    35    36
15    43    44    45

Reshape table with multiple columns in groups of 3 into table with 3 columns

5 Answers5