I'm currently doing a machine learning project using python (beginner here, learning everything from scratch).
Just wanna know what's the difference between statsmodels' OLS and scikit's PooledOlS using the same panel dataset I have. I tried both and they gave me the same results. Does that mean they're essentially doing the same thing but from different packages? Am I supposed to get the same results? Or is it that I am doing something wrongly?
My dataset looks something like that below:
excessreturnlag1m ROA ... momentum6m momentum12m
bank date ...
bankA 2019-06-30 -14.564600 0.9795 ... 0.14 -0.24
2019-05-31 7.522300 0.9795 ... -0.69 -1.97
2019-04-30 -2.020400 0.9795 ... 1.36 -1.70
bankB 2019-06-30 -5.969600 0.9915 ... -0.39 -1.77
2019-05-31 0.220200 0.9915 ... -0.24 -2.00
2019-04-30 -1.900000 0.9915 ... -0.06 -1.42
bankC 2019-06-30 2.721700 0.9763 ... -0.38 -1.13
2019-05-31 -8.418900 0.9763 ... -1.28 -1.19
2019-04-30 -1.001100 0.9763 ... -3.06 -1.16
I currently have a MultiIndex (bank and date) in my Dataframe. Am I supposed to use that to do a panel regression?
Edit: OK from what I understand, PooledOLS is a "special" case of multiple linear regression so it will give the same results as statsmodels' OLS? Correct me if i'm wrong!