Data structure issue with plm: multiple observations with the same year-country pair

Question

I am trying to run fixed effects and random effects regressions on data which to my understanding is "pseudo" panel data. The dataset consists of rows of syndicated loans with borrower country-level variables attached. The question is how can I apply pdata.frame() to my data when my data has multiple observations (loans) per country-year pair? If I try to use year and country I get the following error:

Warning in pdata.frame(df, index = c("borrower_country", "year")) :
  duplicate couples (id-time) in resulting pdata.frame
 to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")

I've found some related questions on the topic (link), but I don't see how they would apply to my case. Each loan in the data has a unique id number, but if I were to pair that with e.g. year like was suggested in the link, wouldn't that ruin any year fixed/random effects I may want?

I know that panel data methods have been used in studies with a similar data structure, but I don't understand how to apply the methods to my data.

Same question expanded at crossvalidated.

Sounds like the loans are your units of observation, thus you would put loans ID into the first index dimension. E.g., your dependend variable could be the amount of loans. The borrower country could be something you are interested in whether it has an effect and the effect size on the loans' amount. — Helix123, Sep 07 '21 at 21:17
Thank you for your reply. My depenent variable is the loan spread and main independent variables of interest are borrower and lender related. In addition I have a collection of controls. The thing is my data is not pure panel data but rather independent cross-sections of loans from 20 years, so I don't have the time dimension observing the same loans. Previously I used id and year in the index but is it valid since it results in "Unbalanced Panel: n = 5101, T = 1-1, N = 5101". It works with plm's "pooled" but not others. I expanded the question in the linked crossvalidated question. — Moz, Sep 08 '21 at 06:21
If you do not have id-period combinations, you do not have panel data. plm`s `method = "pooling"` works as it just uses the data set, no matter whether you specified a time dimension that is sane or not. You could use `lm()` instead. — Helix123, Sep 09 '21 at 07:23

Data structure issue with plm: multiple observations with the same year-country pair

0 Answers0