1

I'm currently doing a machine learning project using python (beginner here, learning everything from scratch).

Just wanna know what's the difference between statsmodels' OLS and scikit's PooledOlS using the same panel dataset I have. I tried both and they gave me the same results. Does that mean they're essentially doing the same thing but from different packages? Am I supposed to get the same results? Or is it that I am doing something wrongly?

My dataset looks something like that below:

                  excessreturnlag1m      ROA  ...  momentum6m  momentum12m
bank  date                                    ...                         
bankA 2019-06-30         -14.564600   0.9795  ...        0.14        -0.24
      2019-05-31           7.522300   0.9795  ...       -0.69        -1.97
      2019-04-30          -2.020400   0.9795  ...        1.36        -1.70
bankB 2019-06-30          -5.969600   0.9915  ...       -0.39        -1.77
      2019-05-31           0.220200   0.9915  ...       -0.24        -2.00
      2019-04-30          -1.900000   0.9915  ...       -0.06        -1.42
bankC 2019-06-30           2.721700   0.9763  ...       -0.38        -1.13
      2019-05-31          -8.418900   0.9763  ...       -1.28        -1.19
      2019-04-30          -1.001100   0.9763  ...       -3.06        -1.16

I currently have a MultiIndex (bank and date) in my Dataframe. Am I supposed to use that to do a panel regression?

Edit: OK from what I understand, PooledOLS is a "special" case of multiple linear regression so it will give the same results as statsmodels' OLS? Correct me if i'm wrong!

Adriel L
  • 11
  • 2
  • Possible duplicate of [OLS Regression: Scikit vs. Statsmodels?](https://stackoverflow.com/questions/22054964/ols-regression-scikit-vs-statsmodels) – steven Sep 28 '19 at 16:35
  • https://stats.stackexchange.com/questions/146804/difference-between-statsmodel-ols-and-scikit-linear-regression/146809 – steven Sep 28 '19 at 16:35
  • https://becominghuman.ai/stats-models-vs-sklearn-for-linear-regression-f19df95ad99b – steven Sep 28 '19 at 16:35
  • hmm but those are OLS vs linear regression. does that means that PooledOLS for panel data nis the same as linear regression? – Adriel L Sep 29 '19 at 06:03
  • https://pypi.org/project/linearmodels/ – steven Sep 29 '19 at 23:12

1 Answers1

0

OLS: Ordinary Least Squares is just a simple calculation method.
The implementation may differ depending on how to solve the normal equation, but it may be GOOD to measure the execution time because there are differences in the amount of calculation.

luthierBG
  • 164
  • 1
  • 9