1

I might have not asked the proper question in my research, sorry in such case.

I have a multiple columns dataset:

helena <- 


   Year   US$  Euros Country Regions
   2001   12   13    US      America 
   2000   13   15    UK      Europe
   2003   14   19    China   Asia 

I want to group the dataset in a way that I have for each region the total per year of the earnings plus a column showing how many countries have communicated their data per region every year

helena <- 


   Year   US$  Euros  Regions   Number of countries per region per Year
   2000   150   135   America    2
   2001   135   151   Europe     15
   2002   142   1900  Asia       18  

Yet, I have tried

      count(helena, c("Regions", "Year"))

but it does not work properly since includes only the columns indicated

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Helena
  • 87
  • 9

4 Answers4

1

With dplyr:

library(dplyr)

your_data %>%
  group_by(Regions, Year) %>%
  summarize(
    US = sum(US), 
    Euros = sum(Euros),
    N_countries = n_distinct(Country)
  )
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Hi. Thanks Nevertheless it gives me the general total (i.e. only one line), not the total by Region and Year.. – Helena Sep 19 '19 at 08:44
  • That means you've loaded `plyr` after `dplyr`, so you're using the wrong version of `summarize`. You can specify with `dplyr::summarize(...)`. See [this faq](https://stackoverflow.com/q/26106146/903061). – Gregor Thomas Sep 19 '19 at 13:08
1

Here is the data.table way, I have added a row for Canada for year 2000 to test the code:

library(data.table)

df <- data.frame(Year = c(2000, 2001, 2003,2000), 
             US = c(13, 12, 14,13), 
             Euros = c(15, 13, 19,15), 
             Country = c('US', 'UK', 'China','Canada'), 
             Regions = c('America', 'Europe', 'Asia','America'))

df <- data.table(df)

df[,
   .(sum_US = sum(US),
     sum_Euros = sum(Euros),
     number_of_countries = uniqueN(Country)),
   .(Regions, Year)]

   Regions Year sum_US sum_Euros number_of_countries
1: America 2000     26        30                   2
2:  Europe 2001     12        13                   1
3:    Asia 2003     14        19                   1
haci
  • 241
  • 1
  • 8
  • Hi, thanks This is not accepting the calculation for the number of countries, though. Is it possible any typing mistake? – Helena Sep 19 '19 at 08:50
  • I just got rid of the dollar sign in the column name for convenience, that is why. Pay attention to that and the code will behave as expected. – haci Sep 19 '19 at 09:06
0

using tidyr

library(tidyr)

df %>% group_by(Regions, Year) %>%
summarise(Earnings_US = sum(`US$`),
          Earnings_Euros = sum(Euros),
          N_Countries = length(Country))

aggregate the data set by regions, summing the earnings columns and doing a length of the country column (assuming countries are unique)

m070ch
  • 56
  • 3
  • Hi. Thanks. Nonetheless this is giving me the general totals, not the breakdown of number of Country occurrences per Year and per Region – Helena Sep 18 '19 at 14:54
  • didn't realize you needed by region and year, I will add Year to `group_by` – m070ch Sep 18 '19 at 15:07
0

Using tidyverse and building the example

library(tidyverse)

df <- tibble(Year = c(2000, 2001, 2003,2000), 
             US = c(13, 12, 14,13), 
             Euros = c(15, 13, 19,15), 
             Country = c('US', 'UK', 'China','Canada'), 
             Regions = c('America', 'Europe', 'Asia','America'))


df %>% 
        group_by(Regions, Year) %>%
        summarise(US = sum(US), 
                Euros = sum(Euros), 
                Countries = n_distinct(Country))

updated to reflect the data in the original question

JFlynn
  • 344
  • 2
  • 8
  • Hi. Thanks. Nonetheless even this is giving me the general totals, not the breakdown of number of Country occurrences per Year and per Region – Helena Sep 18 '19 at 14:55
  • When I run the above code I get the breakdown of country occurrence by year and region. Could you let me know if you are seeing any errors? Or are my assumptions about your data wrong? – JFlynn Sep 18 '19 at 15:17
  • I just saw the new data. So it now gives an equivalent tidyverse implementation of what you have in the original question. – JFlynn Sep 18 '19 at 15:24
  • Thanks. The problems is still on, though. I wonder why the last line of code seems to be problematic... – Helena Sep 19 '19 at 08:59