0

I am trying to extract a data frame of summarised values from a larger data frame as given below:

Year = c(2000,2001,2002,2003,2000,2001,2002,2003,2000,2001,2002,2003,2000,2001,2002,2003)
Country_Name = rep(c("Afghanistan", "Brazil", "Germany", "Italy"), each=4)
total_population = c(10,15,13,12,11,16,14,13,12,17,15,14,13,18,16,15)
Life_expectancy = c(60,67,70,73,61,68,71,74,62,69,72,75,2,69,72,75)
Hospitals = rep(c(5,10,15,13), each=4)
GDP = rep(c(5,7,10,8), each=4)

Example <- data.frame(Year=Year, Country=Country_Name, Population=total_population, Life=Life_expectancy, Hospitals=Hospitals, GDP=GDP)

Example

What I would like to do is summarise this data frame, and extract mean population, mean GDP etc etc by country. I was able to code this to get output for individual countries as such:

country_func <- function(x, df){
  #take summary statistics of 4 variables from tidy dataset
  #x denotes the country name
  
  #ave population
  ave_population <- mean(df$`Population`[df$`Country`==x], na.rm=TRUE)
  #median life
  median_life <- median(df$`Life`[df$`Country`==x], na.rm=TRUE)
  #ave hospital beds per 1000
  ave_hospital_beds <- mean(df$`Hospitals`[df$`Country`==x], na.rm=TRUE)
  #Health expenditure per capita
  mean_GDP <- mean(df$`GDP`[df$`Country`==x], na.rm=TRUE )
  
  data.frame(ave_population=ave_population, median_life=median_life, ave_hospital_beds=ave_hospital_beds, mean_GDP=mean_GDP)
}

country_func('Afghanistan', Example)

But what I would like to do, is for the function to return a data frame that gives those summary statistics for each of the four countries in one go, so one data frame of 4 rows (one for each country), rather than doing it individually each time.

Thanks in advance.

Richard
  • 17
  • 3
  • https://stackoverflow.com/q/11562656/3358272, https://stackoverflow.com/q/1660124/3358272, https://stackoverflow.com/q/12064202/3358272 (multiple functions) – r2evans Jul 15 '23 at 16:47
  • Base: `aggregate(. ~ Country, data = Example[,-1], FUN = function(x) c(mu = mean(x), med = median(x) ) )`, though that gives you both statistics for all other variables, you can subset if needed. Dplyr: `Example %>% group_by(Country) %>% summarize(across(c(Population, Hospitals, GDP), ~ mean(., na.rm = TRUE), .names = "ave_{.col}"), median_life = median(Life, na.rm = TRUE))`. – r2evans Jul 15 '23 at 16:51
  • If those links and my comment do not resolve your issue, please @-ping me and we'll discuss any gaps, reopening if needed. Hope this helps! – r2evans Jul 15 '23 at 16:52

1 Answers1

0

Please try the below code

library(tidyverse)

Example %>% 
  summarise(across(c(Population,Hospitals,GDP), ~ mean(.x,na.rm = T), .names = 'mean_{col}'),
            across(Life, ~ median(.x, na.rm=T), .names = 'median_{col}'), .by = Country)

Created on 2023-07-15 with reprex v2.0.2

# A tibble: 4 × 5
  Country     mean_Population mean_Hospitals mean_GDP median_Life
  <chr>                 <dbl>          <dbl>    <dbl>       <dbl>
1 Afghanistan            12.5              5        5        68.5
2 Brazil                 13.5             10        7        69.5
3 Germany                14.5             15       10        70.5
4 Italy                  15.5             13        8        70.5
jkatam
  • 2,691
  • 1
  • 4
  • 12