0

Is it possible to summarise big number of columns, without writing all their names?

My example: I have a dataframe (dt) with one categorical column and a lot of numeric colunms:

Cat num1 num2 num3 ... num50
a   56   59   67   ... 89
a   46   66   27   ... 59
b   15   9    75   ... 43
b   45   29   35   ... 93

I make the following operation:

dt %>% group_by(Cat) %>% summarize(num1 = sum(num1), num2 = sum(num2), ... num50= sum(num50))

But writing all the 50 column names takes too long time! Can I write this summarize expression shorter? I tried this variant, but it doesn't work:

dt %>% 
  group_by(Cat) %>% 
  summarize(num1:num50 = sum(c(num1:num50)))

Help me, please, how to write it laconically using dplyr of data.table (or other libraries).

camille
  • 16,432
  • 18
  • 38
  • 60
  • 1
    I think this post might answer your question: https://stackoverflow.com/questions/21644848/summarizing-multiple-columns-with-dplyr – Mel G Feb 20 '22 at 21:55

1 Answers1

0
dt %>% group_by(Cat) %>% summarize_all(sum, na.rm=T)

Also, the _all dplyr verbs have been superseded by the use of across, so you can do something like this:

dt %>% group_by(Cat) %>% summarize(across(everything(), sum, na.rm=T))

Or, if you have other columns as well, you can specify the num columns directly like this

dt %>% group_by(Cat) %>% summarize(across(starts_with("num"), sum, na.rm=T))
langtang
  • 22,248
  • 1
  • 12
  • 27