1

I have a panel data file (long format) and I need to convert it to cross-sectional data. That is I don't just need a transformation to the wide format but I need exactly one observation per individual that contains the mean for each variable.

Here's what I want to to: I have panel data (a number of observations for each individual) in a data frame and I'm looking for an easy way in R to generate a new data frame that contains cumulated data for each individual, i. e. either the sum of all observations in each variable or their mean. It might also be interesting to get a measure of volatility.

For example I have a given data frame panel_data that contains panel data:

> individual <- c(1,1,2,2,3,3)
> var1 <- c(2,3,3,3,4,3)
> panel_data <- data.frame(individual,var1)
> panel_data
   individual var1
1           1    2
2           1    3
3           2    3
4           2    3
5           3    4
6           3    3

The result should look like this:

> cross_data
   individual var1
1           1    5
2           2    6
3           3    7

Now this is only an example. I need this feature in a number of varieties, the most important one probably being the intra-individual mean for each variable.

mango
  • 17
  • 1
  • 6

1 Answers1

0

There are ways to do this using base R or using the popular packages data.table or dplyr. Everyone has their own preference and mine is dplyr.

You can very easily perform a variety of operation to summarise your data per individual. With dplyr syntax, you first group_by individual to specify that operations should be performed on groups defined by the variable "individual". You can then summarise your groups using a function you specify.

Try the following:

library("dplyr")

panel_data %>%
    group_by(individual) %>%
    summarise(sum_var1 = sum(var1), mean_var1=mean(var1))   

Do not be put off by the %>% notation, it is just a convenient shortcut to chain operations:

  • x %>% f is equivalent to f(x)
  • x %>% f(a) is equivalent to f(x, a)
  • x %>% f(a) %>% g(b) is equivalent to g(f(x, a), b)
asachet
  • 6,620
  • 2
  • 30
  • 74
  • Thanks a lot for your reply. This seems to work in general but if I try to store the new data frame as an object, i. e. "cross_data <- summarise()", I get the error "Error in summarise_(.data, .dots = lazyeval::lazy_dots(...)) : argument ".data" is missing, with no default". The reference for summarise() says that it needs the input data as the first argument but if I add the panel data frame as .data R tells me "Error: data_frames can only contain 1d atomic vectors and lists". Any advice for this problem? – mango Jul 04 '16 at 12:08
  • The three lines chained with `%>%` form a single big statement. If you want to save the output, you have to do `cross_data <- panel_data %>% ... %>% summarise(...)`. Or you can do, although some might frown upon it: `panel_data %>% ... %>% summarise(...) -> cross_data`. – asachet Jul 04 '16 at 13:19
  • Thanks, just figured it out myself! – mango Jul 04 '16 at 21:27