0

I have 500 columns. One is a categorical variable with 3 categories and the rest are continuous variables. There are 50 rows that fall under these columns. How do I group the data frame by the categorical variables, and take the mean of the observations that fall within each category for every column that has continuous variables for that DF? ALSO, remove all NA. I want to create a new CD from this info.

Best, Henry

Henry IV
  • 3
  • 2
  • Welcome to stack overflow. I would recommend these (guidelines)[https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example] to maximize your chances of getting help. Try to provide some data and some code to illustrate your issue. – DJJ Apr 07 '20 at 22:37

1 Answers1

0

When posting to SO, please ensure to include a reproducible example of your data (dput is helpful for this). As it is, I can only guess to the structure of your data.

I like doing general grouping/summarising operations with dplyr. Using iris as an example, you might be able to do somehting like this

library(dplyr)
library(tidyr)
data(iris)

iris %>% 
  drop_na() %>%
  group_by(Species) %>% 
  summarise_all(mean)

summarise_all just automatically uses all non-grouping columns, and takes a function you want to apply.

Note, if you use the dev version of dplyr, you could also do something like

iris %>% 
  group_by(Species) %>% 
  summarise(across(is.numeric), mean)

Since summarise_all is being replaced in favor of across

Conor Neilson
  • 1,026
  • 1
  • 11
  • 27