I'm using an unbalanced panel dataset in R plm package. Since there is one missing variable for 2010 and there are some zero values for some variables, I proceeded in two steps:
#puts the panel data into a pdata.frame
dd <- pdata.frame(panel, index = c ('UF', 'year'))
#Takes out the year 2010
year <- dd$year
dd <- dd[year!=2010, ]
#Takes out values where population equals zero
pop <- dd$pop
dd_1 <- dd[pop!= 0,]
#Renames all variables
PIB <- dd_1$PIB
DT <- dd_1$despesa
RT <- dd_1$receita
#.....
#Runs an OLS model
ols_model <- plm(log(PIB) ~ mortinf + log(prod) + op + log(DT) + Gini + I(log(DT)*Gini) + log(RT) + log(pop), data = dd_1, model = "pooling")
summary (ols_model)
However, when I did as above, I couldn't plot the values of the variables in graphs (because dd_1$GDP, for example, isn't considered a vector). So I've changed the data manipulation order: Instead of putting the data into a data.frame, I first took out the values from the panel and then indicated in the OLS model what were the indexes for year and unit.
year <- panel$year
dd <- panel[year!=2010,]
#Takes out observations where pop == 0
pop <- dd$pop
dd_1 <- dd[pop!=0, ]
#Renames variables
GDP <- dd_1$GDP
DT <- dd_1$DT
#...
#This way I could plot, for example, GDP x DT in a graph
#Then I ran an OLS model:
ols_model <- plm(log(PIB) ~ mortinf + log(prod) + op + log(DT) + Gini + I(log(DT)*Gini) + log(RT) + log(pop),
data = dd_1, model = "pooling", index = c ('UF', 'year'))
summary (ols_model)
#But it gave different results than the first OLS!
In my understanding, both models should have outputed the same results, but they were very different. Could anyone please help me? What is the right way? Thanks in advance