0

I have an issue and I don't know if there's a way out. I have a large spreadsheet (20000 rows and 500 columns) that I need to manipulate, where data was put into additional columns when it should've been put into a 2nd and 3rd row instead. Each entry spans one large row instead of 3 small rows as it should be. So now I need to create two new rows after each existing row. Then take data in columns C-H through C-L, and move it to the following rows.

C-A  C-B  C-C  C-D  C-E  C-F  C-G  C-H  C-I  C-J  C-K  C-L 
A1   A2   A3   A4   A5   A6   A7   A8   A9   A10  A11  A12
B1   B2   B3   B4   B5   B6   B7   B8   B9   B10  B11  B12

How do I move observations in C-H through C-L to a new row with some of the existing data so it looks like this:

C-A  C-B  C-C  C-D  C-E  C-F  C-G  C-H  C-I  C-J  C-K  C-L 
A1   A2   A3   A4   A5   A6   
A1   A2   A3   A7   A8   A9
A1   A2   A3   A10  A11  A12
B1   B2   B3   B4   B5   B6   
B1   B2   B3   B7   B8   B9
B1   B2   B3   B10  B11  B12

And so on, until 20,000 rows becomes 60,000 rows. Is there any help for me out there?

Thank you in advance!

stefan
  • 90,330
  • 6
  • 25
  • 51

1 Answers1

3

One way would be to get data in long format, create a group of 3 values each and get data in wide format again.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -(`C-A`:`C-C`)) %>%
  group_by(`C-A`, `C-B`, `C-C`) %>%
  mutate(grp = ceiling(row_number()/3), 
         name = rep(paste0('col', 1:3), length.out = n())) %>%
  pivot_wider() %>%
  ungroup %>%
  select(-grp)

# `C-A` `C-B` `C-C` col1  col2  col3 
#  <chr> <chr> <chr> <chr> <chr> <chr>
#1 A1    A2    A3    A4    A5    A6   
#2 A1    A2    A3    A7    A8    A9   
#3 A1    A2    A3    A10   A11   A12  
#4 B1    B2    B3    B4    B5    B6   
#5 B1    B2    B3    B7    B8    B9   
#6 B1    B2    B3    B10   B11   B12  

data

df <- structure(list(`C-A` = c("A1", "B1"), `C-B` = c("A2", "B2"), 
    `C-C` = c("A3", "B3"), `C-D` = c("A4", "B4"), `C-E` = c("A5", 
    "B5"), `C-F` = c("A6", "B6"), `C-G` = c("A7", "B7"), `C-H` = c("A8", 
    "B8"), `C-I` = c("A9", "B9"), `C-J` = c("A10", "B10"), `C-K` = c("A11", 
 "B11"), `C-L` = c("A12", "B12")), class = "data.frame", row.names = c(NA, -2L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213