Order in taking out values in a panel data set R

Question

I'm using an unbalanced panel dataset in R plm package. Since there is one missing variable for 2010 and there are some zero values for some variables, I proceeded in two steps:

#puts the panel data into a pdata.frame
dd <- pdata.frame(panel, index = c ('UF', 'year'))

#Takes out the year 2010
year <- dd$year
dd <- dd[year!=2010, ]

#Takes out values where population equals zero
pop <- dd$pop
dd_1 <- dd[pop!= 0,]

#Renames all variables
PIB <- dd_1$PIB
DT <- dd_1$despesa
RT <- dd_1$receita
#.....

#Runs an OLS model
ols_model <- plm(log(PIB) ~ mortinf + log(prod) + op  + log(DT) + Gini + I(log(DT)*Gini) + log(RT) + log(pop),  data = dd_1, model = "pooling")
summary (ols_model)

However, when I did as above, I couldn't plot the values of the variables in graphs (because dd_1$GDP, for example, isn't considered a vector). So I've changed the data manipulation order: Instead of putting the data into a data.frame, I first took out the values from the panel and then indicated in the OLS model what were the indexes for year and unit.

year <- panel$year
dd <- panel[year!=2010,]

#Takes out observations where pop == 0
pop <- dd$pop
dd_1 <- dd[pop!=0, ]

#Renames variables

GDP <- dd_1$GDP
DT <- dd_1$DT
#...
#This way I could plot, for example, GDP x DT in a graph
#Then I ran an OLS model:
ols_model <- plm(log(PIB) ~ mortinf + log(prod) + op  + log(DT) + Gini + I(log(DT)*Gini) + log(RT) + log(pop),  
data = dd_1, model = "pooling", index = c ('UF', 'year'))
summary (ols_model)

#But it gave different results than the first OLS!

In my understanding, both models should have outputed the same results, but they were very different. Could anyone please help me? What is the right way? Thanks in advance

I am a bit confused. Firstly, there is no need to create a new variable every time you subset. You can use ```panel[panel$year!=2010,]```. When you write "rename variables" you are actually creating new vectors, you arent renaming them. I think it would good practice to use only variables that are in pdata.frame object when you run plm. Also, it is a bit difficult to help if you dont provide a reproducible example https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — desval, May 04 '20 at 18:15
Thanks for the suggestion! I've made a reproducible question, if you could still help me:https://stackoverflow.com/questions/61600824/how-to-correctly-take-out-zero-observations-in-panel-data-in-r — francescobfc, May 04 '20 at 20:15

Order in taking out values in a panel data set R

0 Answers0