0

I am currently trying to transform a cross-sectional data set into a panel data set. Currently I have a variable called "state" and a variable called "year". I would like to re-arrange the observations, so that they are displayed per state per year and the numbers display averages of the other variables (e.g. income) per state per year respectively. Anyone has an idea how I could proceed?

Thank you very much in advance!

  • It will be easier to help if you make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – camille Dec 04 '18 at 21:11

1 Answers1

0

If I understand your question correctly. The code below should help. It is helpful with questions to add a small example data set, and your desired output.

This answer uses the dplyr package

library(dplyr)

Example data:

data <- tibble(state = c("florida", "florida", "florida", 
                      "new_york", "new_york", "new_york"),
               year = c(1990, 1990, 1992, 1992, 1992, 1994), 
               income = c(19, 13, 45, 34, 66, 34))

To produce:

# A tibble: 6 x 3
  state     year income
  <chr>    <dbl>  <dbl>
1 florida   1990     19
2 florida   1990     13
3 florida   1992     45
4 new_york  1992     34
5 new_york  1992     66
6 new_york  1994     34

Code to summarise data (using dplyr package)

data %>%
  group_by(state, year) %>%
  summarise(
    mean_income = mean(income)
  )

Produces this output:

# A tibble: 4 x 3
# Groups:   state [?]
  state     year mean_income
  <chr>    <dbl>       <dbl>
1 florida   1990          16
2 florida   1992          45
3 new_york  1992          50
4 new_york  1994          34
Pete
  • 600
  • 1
  • 6
  • 16
  • Thank you very much for your explanation! How can I transform a data set with more than 300.000 observations? Is there a more convenient way than typing it for each observation (as in your step 1)? – Sina Bes Dec 05 '18 at 12:05
  • Where is your data currently stored? – Pete Dec 05 '18 at 17:05
  • Currently in a simple data.table – Sina Bes Dec 05 '18 at 17:21
  • You don't need to transform your data, the code to summarise your data still works with a data.table. If you still wanted to convert your data.frame to a tibble, then use `as_tibble()`. – Pete Dec 06 '18 at 09:16