0

I'm struggling with a sizeable panel data in long format with multiple variables. It looks like this

set.seed(42)

    dat_0=
      data.frame(
      c(rep('AFG',2),rep('UK',2)),
      c(rep(c('GDP','pop'),2)),
      runif(4),
      runif(4))
    colnames(dat_0)<-c('country','variable','2010','2011')

Producing a data frame like this:

  country variable        2010      2011
1     AFG      GDP 0.535761290 0.7515226
2     AFG      pop 0.002272966 0.4527316
3      UK      GDP 0.608937453 0.5357900
4      UK      pop 0.836801559 0.5373767

And I am trying/struggling to coerce it to this structure

    country   year        GDP      pop
1     AFG     2010 0.5357612   0.0022729
2     AFG     2011 0.7515226   0.4527316
3      UK     2010 0.6089374   0.8368015
4      UK     2011 0.5357900   0.5373767

Apologies if repeated, I seem to be struggling with reshape/tidyr/dplyr

augusto
  • 3
  • 2

2 Answers2

0

You need to gather and then spread:

library(tidyverse)
set.seed(42)

dat_0 <- data.frame(c(rep("AFG", 2), rep("UK", 2)), c(rep(c("GDP", "pop"), 2)), runif(4), runif(4))
colnames(dat_0) <- c("country", "variable", "2010", "2011")

dat_0 %>%
  gather(year, value, `2010`, `2011`) %>%
  spread(variable, value)
#>   country year       GDP       pop
#> 1     AFG 2010 0.9148060 0.9370754
#> 2     AFG 2011 0.6417455 0.5190959
#> 3      UK 2010 0.2861395 0.8304476
#> 4      UK 2011 0.7365883 0.1346666

Created on 2019-02-20 by the reprex package (v0.2.1)

Calum You
  • 14,687
  • 4
  • 23
  • 42
0

Looks like you could solve your problem with a mix from spread and gather functions from the tidyverse package.

Edit: actually the package is tidyr, which is part of the tidyverse package

You can solve this problem in two steps.

First: gather by year and values, creating a new column called "measurement"

> dat_1 <- dat_0 %>% gather(key="year",value="measurement","2010":"2011")
> dat_1
  country variable year measurement
1     AFG      GDP 2010   0.9148060
2     AFG      pop 2010   0.9370754
3      UK      GDP 2010   0.2861395
4      UK      pop 2010   0.8304476
5     AFG      GDP 2011   0.6417455
6     AFG      pop 2011   0.5190959
7      UK      GDP 2011   0.7365883
8      UK      pop 2011   0.1346666

Second: spread by your new "variable" and "measurement"

> dat_2 <- dat_1 %>% spread(key="variable",value="measurement")
> dat_2
  country year       GDP       pop
1     AFG 2010 0.9148060 0.9370754
2     AFG 2011 0.6417455 0.5190959
3      UK 2010 0.2861395 0.8304476
4      UK 2011 0.7365883 0.1346666

I sincerly hope this solves your problem.

Just Burfi
  • 159
  • 9
  • It's a technicality, but `spread` and `gather` are both in `tidyr`, one of the packages installed & loaded by `tidyverse` – camille Feb 20 '19 at 19:40
  • Also, this is more or less identical to @CalumYou's answer – camille Feb 20 '19 at 19:41
  • @camille That's right, we might have answered at the same time since the question is quite fresh (maybe I took myself more time to type the answer in a more educational way). You're also right about the package. – Just Burfi Feb 20 '19 at 19:47