0

R newbie here. I am working on a project for which I need to combine multiple years of data into a single summary statistic for each column. For example, I have five years worth of data that need to be averaged, with several columns for different variables. The example provided in modern dive works:

summary_monthly_temp <- weather %>% 
 group_by(month) %>% 
 summarize(mean = mean(temp, na.rm = TRUE), 
 std_dev = sd(temp, na.rm = TRUE)
 ) 

summary_monthly_temp

Then I modified it to fit my needs:

summarysummary<- filename%>% 
 group_by(country) %>% 
 summarize(mean = mean(gdp, na.rm = TRUE), 
 std_dev = sd(gdp, na.rm = TRUE)
 )

But within the summarize function, I need to summarize a few more variables such as population (getting the mean population) and total gdp.

What is the best way to do this?

I tried something like this but it is not working:

summary<- filename%>% 
 group_by(country) %>% 
 summarize(mean = mean(gdp, na.rm = TRUE), 
  std_dev = sd(gdp, na.rm = TRUE))%>%
 summarize(mean = mean(pop, na.rm = TRUE), 
 std_dev = sd(pop, na.rm = TRUE))%>%

I think I know why...piping one function into the other...

Thanks for your input!

Shale
  • 1
  • 2
    Hi, welcome to So, may I recommend that you read: https://stackoverflow.com/help/minimal-reproducible-example on how to post questions. – MatthewR Nov 12 '19 at 14:51
  • Does this answer your question? [How to sum a variable by group](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) – camille Nov 12 '19 at 15:26
  • Your code is not copied completely (stops at `%>%`) it would be good if you [edit] your question to make it readable... Also, what does "it does not work" mean? Wrong results? Error? What message? – akraf Nov 12 '19 at 16:31

1 Answers1

0

First and foremost, you don't usually need to save data after applying a summarize function, because it's main use is to generate a summary of your data as an output on the console.

Now looking at your code, I see an issue:

filename %>% 
 group_by(country) %>% 
 summarize(
   mean = mean(gdp, na.rm = TRUE), 
   std_dev = sd(gdp, na.rm = TRUE)
 )

The problem seems to be the object called "filename", you need to import it explicitly as an R object in your workspace. This guide should help you importing data from local files: https://github.com/rstudio/cheatsheets/raw/master/data-import.pdf

Now regarding the usage of summarize, as you example show, you can have multiple outputs, let's assume your dataframe has a variable named "pop":

actually_a_dataframe%>% 
 group_by(country) %>% 
 summarize(
   mean_gdp = mean(gdp, na.rm = TRUE), 
   std_dev_gdp = sd(gdp, na.rm = TRUE),
   mean_pop = mean(pop, na.rm = TRUE), 
   std_dev_pop = sd(pop, na.rm = TRUE)
 )

This would produce a mean and std for both gdp and pop, for each country.