Get fitted values of a log dependent variable to the original panel data set in R

Question

I am trying to combine the fitted value of the dependent variable of log-log model. My data set is a unbalanced panel. I tried to do it in the way as indicated here. But my question is different because I have already converted my big data set into a plm object and it's a log dependent variable.

My simple data set can be accessed via the following codes.

dat = structure(list(Time = structure(c(9L, 7L, 15L, 1L, 17L, 13L, 
11L, 3L, 23L, 21L, 19L, 5L, 10L, 8L, 16L, 2L, 18L, 14L, 12L, 
4L, 24L, 22L, 20L, 6L), .Label = c("Apr-00", "Apr-01", "Aug-00", 
"Aug-01", "Dec-00", "Dec-01", "Feb-00", "Feb-01", "Jan-00", "Jan-01", 
"Jul-00", "Jul-01", "Jun-00", "Jun-01", "Mar-00", "Mar-01", "May-00", 
"May-01", "Nov-00", "Nov-01", "Oct-00", "Oct-01", "Sep-00", "Sep-01"
), class = "factor"), Firm = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Out = c(161521L, 
142452L, 365697L, 355789L, 376843L, 258762L, 255447L, 188545L, 
213663L, 273209L, 317468L, 238668L, 241286L, 135288L, 363609L, 
318472L, 446279L, 390230L, 118945L, 174887L, 183770L, 197832L, 
317468L, 238668L), Lab = c(261L, 334L, 156L, 134L, 159L, 119L, 
41L, 247L, 251L, 62L, 525L, 217L, 298L, 109L, 7L, NA, 0L, 50L, 
143L, 85L, 80L, 214L, 525L, 217L), Cap = c(13L, 15L, 14L, 12L, 
15L, 12L, 45L, 75L, NA, 12L, 15L, 16L, 42L, 45L, 24L, 56L, 12L, 
12L, 45L, NA, 15L, 12L, 15L, 16L)), .Names = c("Time", "Firm", 
"Out", "Lab", "Cap"), class = "data.frame", row.names = c(NA, 
-24L))

My data set looks like below and with missing data of the predictors.

    +--------+------+--------+-----+-----+
|  Time  | Firm |  Out   | Lab | Cap |
+--------+------+--------+-----+-----+
| Jan-00 | A    | 161521 | 261 |  13 |
| Feb-00 | A    | 142452 | 334 |  15 |
| Mar-00 | A    | 365697 | 156 |  14 |
| Apr-00 | A    | 355789 | 134 |  12 |
| May-00 | A    | 376843 | 159 |  15 |
| Jun-00 | A    | 258762 | 119 |  12 |
| Jul-00 | A    | 255447 |  41 |  45 |
| Aug-00 | A    | 188545 | 247 |  75 |
| Sep-00 | A    | 213663 | 251 |     |
| Oct-00 | A    | 273209 |  62 |  12 |
| Nov-00 | A    | 317468 | 525 |  15 |
| Dec-00 | A    | 238668 | 217 |  16 |
| Jan-01 | B    | 241286 | 298 |  42 |
| Feb-01 | B    | 135288 | 109 |  45 |
| Mar-01 | B    | 363609 |   7 |  24 |
| Apr-01 | B    | 318472 |     |  56 |
| May-01 | B    | 446279 |   0 |  12 |
| Jun-01 | B    | 390230 |  50 |  12 |
| Jul-01 | B    | 118945 | 143 |  45 |
| Aug-01 | B    | 174887 |  85 |     |
| Sep-01 | B    | 183770 |  80 |  15 |
| Oct-01 | B    | 197832 | 214 |  12 |
| Nov-01 | B    | 317468 | 525 |  15 |
| Dec-01 | B    | 238668 | 217 |  16 |
+--------+------+--------+-----+-----+

I can get the fitted values using the following codes

library(zoo)
library(plm)

Sys.setlocale("LC_TIME", "English")
dat["time1"] <- as.yearmon(dat$Time,format="%b-%y")
pdat <-pdata.frame(dat,index=c("Firm","time1"))
Model1<- plm(log(Out) ~ lag(log(Cap), 1) + log(Lab + 1),
         model = "within", data=pdat)
summary(Model1)

library(data.table)
FV_Log <- data.table(Model1$model[[1]] - Model1$residuals)

But the observations of the pdat is 24 observations FV_Log is 19 observations, so I am unable to merge it to pdat. My pdat is large with several thousands of observations and I have created many variables using codes. So any help to merge the fitted value into the original pdat with properly (without changing the order) would be much appreciated.

score 1 · Accepted Answer · answered Apr 08 '16 at 07:31

The problem you're facing is twofold. First, you have NA's that your model is dropping when calculating the residuals so naturally your result will have fewer elements. The second problem is that you're lagging one of your covariates, which produces an NA for your first observation (first temporally) -after all that's what a lag does. To overcome that problem you need additional data from the previous time period or you have to forgo that observation. Assuming you do not have access to data from the previous time period, this is how I would go about resolving the issue.

 #First I would create a new variable for CAP and just lag and log that separately, rather than applying the function in the formula of the model itself
 pdat$Cap.lag.ln<-lag(log(pdat$Cap), 1)
 pdat$Cap<-NULL #deleting the old variable to clear up the mess

 #Dont necessarily need the na.omit but it couldn't hurt...
 Model1<- plm(log(Out) ~ Cap.lag.ln + log(Lab + 1),
    model = "within", data=pdat, na.omit=TRUE)
 FV_Log <- data.table(Model1$model[[1]] - Model1$residuals)

 #Now this is where you reduce your original dataset (pdat) by getting rid of the NAs
 pdat2<-na.omit(pdat)
 #You will notice that they're the same dimensions now and you can cbind
 pdat3 <-cbind(pdat2,FV_Log)
      Time Firm    Out Lab    time1 Cap.lag.ln       V1
 1: Feb-00    A 142452 334 Feb 2000          2 12.41211
 2: Mar-00    A 365697 156 Mar 2000          2 12.54861
 3: Apr-00    A 355789 134 Apr 2000          2 12.57580
 4: May-00    A 376843 159 May 2000          2 12.54520
 5: Jun-00    A 258762 119 Jun 2000          2 12.59702
 6: Jul-00    A 255447  41 Jul 2000          2 12.78611
 7: Aug-00    A 188545 247 Aug 2000          3 12.28887
 8: Sep-00    A 213663 251 Sep 2000          4 12.10858
 9: Nov-00    A 317468 525 Nov 2000          2 12.33084
 10: Dec-00   A 238668 217 Dec 2000          2 12.48949
 11: Feb-01   B 135288 109 Feb 2001          3 12.20776
 12: Mar-01   B 363609   7 Mar 2001          3 12.67984
 13: May-01   B 446279   0 May 2001          4 12.87698
 14: Jun-01   B 390230  50 Jun 2001          2 12.52360
 15: Jul-01   B 118945 143 Jul 2001          2 12.33665
 16: Aug-01   B 174887  85 Aug 2001          3 12.25209
 17: Oct-01   B 197832 214 Oct 2001          2 12.26445
 18: Nov-01   B 317468 525 Nov 2001          2 12.10331
 19: Dec-01   B 238668 217 Dec 2001          2 12.26195

And if you want to retrieve those NA's back, you can do the following:

 pdat3 <-as.data.frame(pdat3)
 pdat4<-merge(pdat3, pdat, 
    by=c("Time","Firm","Out", "Lab","time1"), 
    all.x=TRUE,all.y=TRUE)

   Time Firm    Out Lab    time1 Cap.lag.ln       V1 Cap
 1  Apr-00    A 355789 134 Apr 2000          2 12.57580  12
 2  Apr-01    B 318472  NA Apr 2001         NA       NA  56
 3  Aug-00    A 188545 247 Aug 2000          3 12.28887  75
 4  Aug-01    B 174887  85 Aug 2001          3 12.25209  NA
 5  Dec-00    A 238668 217 Dec 2000          2 12.48949  16
 6  Dec-01    B 238668 217 Dec 2001          2 12.26195  16
 7  Feb-00    A 142452 334 Feb 2000          2 12.41211  15
 8  Feb-01    B 135288 109 Feb 2001          3 12.20776  45
 9  Jan-00    A 161521 261 Jan 2000         NA       NA  13
 10 Jan-01    B 241286 298 Jan 2001         NA       NA  42
 11 Jul-00    A 255447  41 Jul 2000          2 12.78611  45
 12 Jul-01    B 118945 143 Jul 2001          2 12.33665  45
 13 Jun-00    A 258762 119 Jun 2000          2 12.59702  12
 14 Jun-01    B 390230  50 Jun 2001          2 12.52360  12
 15 Mar-00    A 365697 156 Mar 2000          2 12.54861  14
 16 Mar-01    B 363609   7 Mar 2001          3 12.67984  24
 17 May-00    A 376843 159 May 2000          2 12.54520  15
 18 May-01    B 446279   0 May 2001          4 12.87698  12
 19 Nov-00    A 317468 525 Nov 2000          2 12.33084  15
 20 Nov-01    B 317468 525 Nov 2001          2 12.10331  15
 21 Oct-00    A 273209  62 Oct 2000         NA       NA  12
 22 Oct-01    B 197832 214 Oct 2001          2 12.26445  12
 23 Sep-00    A 213663 251 Sep 2000          4 12.10858  NA
 24 Sep-01    B 183770  80 Sep 2001         NA       NA  15

Also, there's a problem with the lag function you're using, row 10 shouldn't be NA -I'm not sure what's causing this problem, I'm assuming its your unorthodox date variable -so you need to fix that separately. — Cyrus Mohammadian, Apr 08 '16 at 07:43
That's wonderful.Thank you very much for very clear step wise answer. For you second comment regarding the NA at row 10, yes that may be because my month-year format. But I have to work with this format because the data are monthly total and it cannot be indicated as date variable. Thank you for pointing this out. — Doo, Apr 08 '16 at 12:31
I have a problem with the pdat3 here because it has now converted to data table and my original panel data frame has gone. Is there another way to combine fitted values to my original panel data structure. I want to use this fitted vales in the panel model further. Thanks — Doo, Apr 25 '16 at 01:39
The following error message comes when used pdat3 in plm model duplicate couples (time-id) Error in pdim.default(index[[1]], index[[2]]) : — Doo, Apr 25 '16 at 10:57

Get fitted values of a log dependent variable to the original panel data set in R

1 Answers1