0

I need to change some of my datasets in the following way.

I have one panel dataset containing an unique firm id as identifier (id), the observation year (year, 2002-2012) and some firm variables with the value for the corresponding year (size, turnover etc,). It looks somewhat like:

[ID] [year] [size] [turnover] ...
1    2002    14     1200
1    2003    15     1250
1    2004    17     1100
1    2005    18     1350
2    2004    10     5750
2    2005    11     6025
...

I need to transform it now in the following way.

  • I create an own matrix for every characteristic of interest, where every firm (according to its id) has only one row and the corresponding values per year in separated columns.

  • It should be mentioned, that not every firm is in the dataset in every year, since they might be founded later, already closed down etc., as illustrated in the example. In the end it should look somehow like the following (example for size variable):

[ID] [2002] [2003] [2004] [2005]
1    14     15     17     18
2    -      -      10     11

I tried it so far with the %in% command, but did not manage to get the values in the right columns.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485

1 Answers1

1
DF <- read.table(text="[ID] [year] [size] [turnover]
1    2002    14     1200
1    2003    15     1250
1    2004    17     1100
1    2005    18     1350
2    2004    10     5750
2    2005    11     6025",header=TRUE)

library(reshape2)
dcast(DF, X.ID.~X.year.,value.var="X.size.")
#   X.ID. 2002 2003 2004 2005
# 1     1   14   15   17   18
# 2     2   NA   NA   10   11
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Thank you all very much. The answers are great! The code as well as the link to the longer discussion with further bg infos. – Daniel S. Hain Apr 23 '13 at 14:30