0

i am trying to get the descriptive statistics from multiple variables between two groups. so far the only way i could find out how to use so is using the group by then summarize function. but it is alot of work to run on all the variables see below. is there any cleaner way of doing this

grouped_summary <- my_data %>% 
  group_by(group) %>% 
  summarize(mean_var1 = mean(variable1, na.rm = TRUE), 
  median_var1 = median(variable1, na.rm = TRUE), 
  sd_var1 = sd(variable1, na.rm = TRUE), 
  mean_var2 = mean(variable2, na.rm = TRUE), 
  median_var2 = median(variable2, na.rm = TRUE),
  sd_var2 = sd(variable2, na.rm = TRUE), 
  count = n())
Phil
  • 7,287
  • 3
  • 36
  • 66
  • https://dplyr.tidyverse.org/reference/across.html – Jon Spring Aug 23 '23 at 03:50
  • It might be easier if your data is reshaped from wide to long format, then grouped by variable name. But it's difficult to help without [seeing the data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – neilfws Aug 23 '23 at 04:54

2 Answers2

0

As jon spring noted, look at the tidyverse documentation. In your case, it might look something like

library(tidyverse)
my_data <- tibble(
  group = c("group_1", "group_1", "group_2", "group_2"),
  variable1 = c(1,2,3,4),
  variable2 = c(5,6,7,8)
)

my_data %>% 
  group_by(group) %>% 
  summarise(across(everything(), 
                   list(mean = mean, sd = sd), 
                   .names = "{.col}_{.fn}")
            )
Brian Syzdek
  • 873
  • 6
  • 10
0

Yes, in R, you can use the dplyr package to achieve this. The summarize() function is used to calculate summary statistics on variables within a data set. If you want to apply it to multiple variables simultaneously, you can use the across() function. Here's an example:

library(dplyr)

# Assuming 'data' is your data frame
summary_data <- data %>%
  summarize(across(c(var1, var2, var3), 
            list(mean = mean, 
                 sd = sd, 
                 median = median)))

This code calculates the mean, standard deviation, and median for the variables var1, var2, and var3 in the data data frame. You can customize the summary functions as needed.

Additionally, if you're looking for descriptive statistics, you can use the summary() function to get a quick overview of the central tendency and distribution of each variable in your data frame.

neilfws
  • 32,751
  • 5
  • 50
  • 63