-1

I have a database with data for about 200 variables for every country in the world, for every year from 1970 to 2019 as a simplified example, the data frame would look something like this:

Data <- data.frame(
    Country = c(rep("Aruba", 5),rep("Afghanistan",5), rep("Angola",5)),
    Year = c(rep(c(2006:2010),3)),
    var1 = c(11.4,   11.1,   10.4,   10.5,   9.98,   10.2,   9.54,   10.6,   11.1,   11.4,   10.7,   9.93,   11.0,   8.98,   10.9),
    var2 = c(64.6,   64.7,   64.8,   65.1,   65.5,   66.1,   66.5,   67.1,   67.6,   68.1,   68.5,   68.8,   69.1,   69.5,   69.8)
    )

I need to do operations on this database, which often include things like finding the change or a variable between two given years for each country In the example above, I could for example need to find the difference between 2007 and 2010 for var1 and var2 for each country, to produce something like this:

       Country diff_var1 diff_var2
1        Aruba     -1.12       0.8
2  Afghanistan       0.8       1.6
3       Angola       0.97        1

Is there any straightforward way of sub-setting the data by country, and then doing operations with the data filtered by year, on R? I don't think dplyr's "group_by" could work?

  • 1
    Please can you show a sample of your data frame and paste it in to the question using dput(), rather than trying to describe it. Thanks. – user438383 Jul 03 '20 at 11:42
  • Please take a look at [How to make a great reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Martin Gal Jul 03 '20 at 11:47
  • Thank you, does that help? Sorry, fairly new at this and am unsure how things are best explained – BelenGiralt Jul 03 '20 at 12:23

1 Answers1

1

If I correctly understand what you're asking:

library(dplyr)

# Fictious data
dat <- 
  expand.grid(
  year = 1970:2019,
  country = c('Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola')
) %>% 
  mutate(
    var1 = rnorm(250, 0, 1),
    var2 = rnorm(250, 0, 1),
    var3 = rnorm(250, 0, 1)
  ) 

dat %>% 
  filter(year == 1970 | year == 2019) %>% 
  group_by(country) %>% 
  summarise_at(vars(var1:var3), diff)

Result:

# A tibble: 5 x 4
  country       var1   var2   var3
  <fct>        <dbl>  <dbl>  <dbl>
1 Afghanistan  0.246  0.848  1.29 
2 Albania     -2.53   3.75  -0.765
3 Algeria      1.96  -1.59   1.51 
4 Andorra      2.14  -0.496 -1.48 
5 Angola      -0.673 -0.727 -1.45 
kofm
  • 125
  • 11