0

I am trying to aggregate data to a higher level for more than 20 columns, I am aware of the regular method in which we write the names one after the other. This is quite repetitive and time consuming. I was wondering if its possible to aggregate the data in such a manner that the column names remain the same.

I thought about performing this by range function but I don't think it is possible while summarizing like its is possible while using select().

Attached is the data frame which I am working on.

structure(list(Type = c("A", "B", "C", "C", "A", "B"), `Row Labels` = c("X6", 
"X7", "X8", "X9", "X11", "X12"), `2022-11-07` = c(25, 50, 75, 
100, 25, 50), `2022-11-14` = c(25, 50, 75, 100, 25, 50), `2022-11-21` = c(25, 
50, 75, 100, 50, 100), `2022-11-28` = c(25, 50, 75, 100, 75, 
150), `2022-12-05` = c(40, 50, 75, 100, 125, 250), `2022-12-12` = c(40, 
50, 75, 100, 200, 400), `2022-12-19` = c(40, 50, 75, 100, 325, 
650), `2022-12-26` = c(40, 50, 75, 100, 525, 1050), `2023-01-02` = c(100, 
110, 120, 130, 850, 1700), `2023-01-09` = c(100, 110, 120, 130, 
1375, 2750), `2023-01-16` = c(100, 110, 120, 130, 2225, 4450), 
    `2023-01-23` = c(100, 110, 120, 130, 3600, 7200), `2023-01-30` = c(100, 
    110, 120, 130, 5825, 11650)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -6L))

enter image description here Attached is the expected output:

structure(list(Type = c("A", "B", "C"), `2022-11-07` = c(50, 
100, 175), `2022-11-14` = c(50, 100, 175), `2022-11-21` = c(75, 
150, 175), `2022-11-28` = c(100, 200, 175), `2022-12-05` = c(165, 
300, 175), `2022-12-12` = c(240, 450, 175), `2022-12-19` = c(365, 
700, 175), `2022-12-26` = c(565, 1100, 175), `2023-01-02` = c(950, 
1810, 250), `2023-01-09` = c(1475, 2860, 250), `2023-01-16` = c(2325, 
4560, 250), `2023-01-23` = c(3700, 7310, 250), `2023-01-30` = c(5925, 
11760, 250)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

enter image description here

The code which I was wrote is:

library(readxl)
library(dplyr)

Book1 <- read_excel("C:/X/X/X- X/X/Book1.xlsx",sheet = 4)
Book2 <- read_excel("C:/X/X/X- X/X/Book1.xlsx",sheet = 5)

First <- colnames(Book1)[3]
Last  <- tail(colnames(Book1),1) 

Book <- Book1 %>% 
  group_by(Type) %>% 
  summarise(Date1 = sum(`2022-11-07`),Date2 = sum(`2022-11-14`)) 

Is there any easier way, where I don't have to write the column names and rather it can be dynamically found out and used, like example summarise(any_of(First):any_of(Last)) ofourse this doesnt work but I am looking for something like this. I have attached images for better visualization. Can help is appreciated.

user20203146
  • 447
  • 7
  • 1
    Use `across()`: e.g,. ```Book %>% group_by(Type) %>% summarize(across(!`Row Labels`, sum))```. You can use any [tidyselect](https://tidyselect.r-lib.org/reference/language.html) expression in place of ```!`Row Labels` ```, e.g., `everything()` or `any_of()`. [Also see.](https://stackoverflow.com/questions/21644848/summarizing-multiple-columns-with-dplyr) – zephryl Feb 23 '23 at 18:34

0 Answers0