Structure of panel data frame in R - consisting of seperate data frames

Question

I have this relatively large data frame in a long panel data format.

However. I need to get this smaller. Basically it is structure as 10 surveys collapsed, which means that the same questions (variables) are repeated and therefore gives me 10 variables measuring the same thing, but only for one year.

The structure is like this:

y <- data.frame(id = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
                year = c(2012, 2013, 2014, 2012, 2013, 2014, 2012, 2013, 2014),
                pasta_2012 = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
                burger_2012 = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
                pizza_2012 = c(2, 2, 2, 1, 1, 1, 1, 1, 1),
                pasta_2013 = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
                burger_2013 = c(3, 3, 3, 2, 2, 2, 1, 1, 1),
                pizza_2013 = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
                pasta_2014 = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
                burger_2014 = c(3, 3, 3, 2, 2, 2, 1, 1, 1),
                pizza_2014 = c(1, 1, 1, 2, 2, 2, 3, 3, 3))

> print(y)
  id year pasta_2012 burger_2012 pizza_2012 pasta_2013 burger_2013 pizza_2013 pasta_2014 burger_2014 pizza_2014
1  1 2012          1           1          2          1           3          1          1           3          1
2  1 2013          1           1          2          1           3          1          1           3          1
3  1 2014          1           1          2          1           3          1          1           3          1
4  2 2012          2           2          1          2           2          2          2           2          2
5  2 2013          2           2          1          2           2          2          2           2          2
6  2 2014          2           2          1          2           2          2          2           2          2
7  3 2012          3           3          1          3           1          3          3           1          3
8  3 2013          3           3          1          3           1          3          3           1          3
9  3 2014          3           3          1          3           1          3          3           1          3

What I want is to add three variables and the then delete the others afterwards, so that I only have one for pizza, pasta and burger and that each year corresponds to the value they have in the given year. Something like this:

  id year pasta burger pizza
1  1 2012     1      1     2
2  1 2013     1      3     1
3  1 2014     1      3     1
4  2 2012     2      2     1
5  2 2013     2      2     2
6  2 2014     2      2     2
7  3 2012     3      3     1
8  3 2013     3      1     3
9  3 2014     3      1     3

Does anyone have an idea how to solve this? I have more than 15 variables*10 in which I need to dog this for.

score 0 · Answer 1 · answered Aug 25 '21 at 10:44

Using pivot_longer we can get the data in long format and keep only unique rows in the data.

library(dplyr)
library(tidyr)

y %>%
  pivot_longer(cols = -c(id, year), 
               names_to = c('.value', 'new_year'), 
               names_sep = '_') %>%
  select(-year) %>%
  distinct()

#     id new_year pasta burger pizza
#  <dbl> <chr>    <dbl>  <dbl> <dbl>
#1     1 2012         1      1     2
#2     1 2013         1      3     1
#3     1 2014         1      3     1
#4     2 2012         2      2     1
#5     2 2013         2      2     2
#6     2 2014         2      2     2
#7     3 2012         3      3     1
#8     3 2013         3      1     3
#9     3 2014         3      1     3

Structure of panel data frame in R - consisting of seperate data frames

1 Answers1