Agregating and counting elements in the variables of a dataset

Question

I might have not asked the proper question in my research, sorry in such case.

I have a multiple columns dataset:

helena <- 


   Year   US$  Euros Country Regions
   2001   12   13    US      America 
   2000   13   15    UK      Europe
   2003   14   19    China   Asia

I want to group the dataset in a way that I have for each region the total per year of the earnings plus a column showing how many countries have communicated their data per region every year

helena <- 


   Year   US$  Euros  Regions   Number of countries per region per Year
   2000   150   135   America    2
   2001   135   151   Europe     15
   2002   142   1900  Asia       18

Yet, I have tried

      count(helena, c("Regions", "Year"))

but it does not work properly since includes only the columns indicated

Gregor Thomas · Answer 1 · 2019-09-18T15:49:00.593

1

With dplyr:

library(dplyr)

your_data %>%
  group_by(Regions, Year) %>%
  summarize(
    US = sum(US), 
    Euros = sum(Euros),
    N_countries = n_distinct(Country)
  )

edited Sep 18 '19 at 15:49

answered Sep 18 '19 at 14:44

Gregor Thomas

136,190
20
167
294

Hi. Thanks Nevertheless it gives me the general total (i.e. only one line), not the total by Region and Year.. – Helena Sep 19 '19 at 08:44
That means you've loaded `plyr` after `dplyr`, so you're using the wrong version of `summarize`. You can specify with `dplyr::summarize(...)`. See [this faq](https://stackoverflow.com/q/26106146/903061). – Gregor Thomas Sep 19 '19 at 13:08

haci · Answer 2 · 2019-09-18T15:16:13.490

1

Here is the data.table way, I have added a row for Canada for year 2000 to test the code:

library(data.table)

df <- data.frame(Year = c(2000, 2001, 2003,2000), 
             US = c(13, 12, 14,13), 
             Euros = c(15, 13, 19,15), 
             Country = c('US', 'UK', 'China','Canada'), 
             Regions = c('America', 'Europe', 'Asia','America'))

df <- data.table(df)

df[,
   .(sum_US = sum(US),
     sum_Euros = sum(Euros),
     number_of_countries = uniqueN(Country)),
   .(Regions, Year)]

   Regions Year sum_US sum_Euros number_of_countries
1: America 2000     26        30                   2
2:  Europe 2001     12        13                   1
3:    Asia 2003     14        19                   1

edited Sep 18 '19 at 15:16

answered Sep 18 '19 at 14:56

haci

241
1
8

Hi, thanks This is not accepting the calculation for the number of countries, though. Is it possible any typing mistake? – Helena Sep 19 '19 at 08:50
I just got rid of the dollar sign in the column name for convenience, that is why. Pay attention to that and the code will behave as expected. – haci Sep 19 '19 at 09:06

m070ch · Answer 3 · 2019-09-18T15:07:58.530

0

using tidyr

library(tidyr)

df %>% group_by(Regions, Year) %>%
summarise(Earnings_US = sum(`US$`),
          Earnings_Euros = sum(Euros),
          N_Countries = length(Country))

aggregate the data set by regions, summing the earnings columns and doing a length of the country column (assuming countries are unique)

edited Sep 18 '19 at 15:07

answered Sep 18 '19 at 14:46

m070ch

56
3

Hi. Thanks. Nonetheless this is giving me the general totals, not the breakdown of number of Country occurrences per Year and per Region – Helena Sep 18 '19 at 14:54
didn't realize you needed by region and year, I will add Year to `group_by` – m070ch Sep 18 '19 at 15:07

JFlynn · Answer 4 · 2019-09-18T15:24:33.403

0

Using tidyverse and building the example

library(tidyverse)

df <- tibble(Year = c(2000, 2001, 2003,2000), 
             US = c(13, 12, 14,13), 
             Euros = c(15, 13, 19,15), 
             Country = c('US', 'UK', 'China','Canada'), 
             Regions = c('America', 'Europe', 'Asia','America'))


df %>% 
        group_by(Regions, Year) %>%
        summarise(US = sum(US), 
                Euros = sum(Euros), 
                Countries = n_distinct(Country))

updated to reflect the data in the original question

edited Sep 18 '19 at 15:24

answered Sep 18 '19 at 14:48

JFlynn

344
2
8

Hi. Thanks. Nonetheless even this is giving me the general totals, not the breakdown of number of Country occurrences per Year and per Region – Helena Sep 18 '19 at 14:55
When I run the above code I get the breakdown of country occurrence by year and region. Could you let me know if you are seeing any errors? Or are my assumptions about your data wrong? – JFlynn Sep 18 '19 at 15:17
I just saw the new data. So it now gives an equivalent tidyverse implementation of what you have in the original question. – JFlynn Sep 18 '19 at 15:24
Thanks. The problems is still on, though. I wonder why the last line of code seems to be problematic... – Helena Sep 19 '19 at 08:59

Agregating and counting elements in the variables of a dataset

4 Answers4