0

While doing some OLS regressions, I discovered that statsmodels.api.add_constant() does the following:

if _is_using_pandas(data, None) or _is_recarray(data):
    from statsmodels.tsa.tsatools import add_trend
    return add_trend(data, trend='c', prepend=prepend, has_constant=has_constant)

If not, it treats data as an ndarray and so you lose some contextual information (e.g. the column names which are the regressor variables names). When importing pandas from modin, the is_using_pandas() above returns False.

It is possible that statsmodels need to add modin as a supported option to their _is_using_pandas() but for now, I'd like to do something like:

if is_using_modin_pandas(x):
    from statsmodels.tsa.tsatools import add_trend
    X = add_trend(x, trend='c', prepend=True, has_constant='skip')
else:
    X = sm.add_constant(x)

How would one write is_using_modin_pandas()?

s5s
  • 11,159
  • 21
  • 74
  • 121
  • You could try to monkey patch the `_is_using_pandas` method? – Iain Shelvington Dec 29 '19 at 13:47
  • @IainShelvington@IainShelvington I've moved away from using patching some time ago but if I remember correctly, when monkey patching, you have to patch it in every module you use it because the patch changes? So to have a generic `is_using_modin_pandas()` I'll also possibly need to examine the stack to generate the patch string? – s5s Dec 29 '19 at 13:50
  • 1
    You just need to import the module where the function is defined and then overwrite that name in the module with your new function. `from foo import bar; bar.function_to_patch = my_new_function`. Since `sys.modules` is cached and is where every module is imported from, when you overwrite it once it should be overwritten everywhere – Iain Shelvington Dec 29 '19 at 13:52
  • https://stackoverflow.com/a/2375443/548562 – Iain Shelvington Dec 29 '19 at 13:54

0 Answers0