Creating a CSV file from regressions using Statsmodels and Pandas

Question

I've been creating various regressions using Statsmodels. I took data, shaped it into Pandas dataframes, and then ran several models on the data. I'm now struggling to output all of those regressions as a CSV file. My goal is to have all of my "regression data" (ie the coefficients, intercepts, standard errors, etc for each control/variable, as well as the number of observations and a few other datapoints) on one axis, with the title of each regression forming the other axis.

So far I've tried multiple approaches, with one looking the most promising. That method has been using

results = FoodPriceReg(PriceChange, RightHandVars)
regexport = RegToCSV(results)
return regexport

to turn the printed summary into a CSV file. I then use

for com in commodity:
    RegOut = RegLoop(com)
    regressions = pd.DataFrame(RegOut)
    name = 'regressions/' + com[2]
    SaveFrame(regressions, name)

to output the regressions as a CSV + .dta file for each food category.

I've also tried both sorting those CSV files into nested lists and converting them to dataframes and trying to work with them. The biggest issue I've had is that the CSV output is very rough and challenging to work with. It's not organized like other Pandas dataframes and I've been unable to come up with a reasonably simple solution to get all of the data in the CSV sorted so that if you open it in Excel, each piece of information would end up in it's own cell.

To clarify, right now each cell of my final CSV output looks like

Dep. Variable:   ,ParboiledCoarseRice2014,  R-squared:         ,   0.010
Model:           ,OLS                    ,  Adj. R-squared:    ,  -0.000
Method:          ,Least Squares          ,  F-statistic:       ,  0.9711 
            ,   coef   , std err ,    t    ,P>|t| ,  [0.025 ,  0.975] 
Intercept   ,   28.5204,    0.216,  131.855, 0.000,   28.095,   28.945
Cash        ,    4.5696,    0.501,    9.112, 0.000,    3.584,    5.555
Food        ,    4.1321,    0.501,    8.240, 0.000,    3.147,    5.117
FoodCash    ,    4.2496,    0.501,    8.474, 0.000,    3.264,    5.235
CashTraining,    5.2596,    0.675,    7.787, 0.000,    3.933,    6.587
FoodTraining,    5.8696,    0.675,    8.691, 0.000,    4.543,    7.197
Control     ,    4.4396,    0.501,    8.853, 0.000,    3.454,    5.425

whereas I want each piece of information to be it's own row, like:

Model: ParboiledCoarseRice2014 ~ Treatment Dummies
R-squared: 0.010
Cash Coef: 4.5696
Cash Std Err: 0.501

I'm thinking that I'm missing something fundamental with working with Statsmodels, as the output of regressions is sparsely documented but seems essential to get much use out of the package.

Why csv? In my opinion storing a dictionary is more straightforward. You could use Json. Then it might be easier to display the results and different algorithms don't mess up your format as long as certain keywords are filled — Quickbeam2k1, Aug 21 '17 at 19:23
The economist I'm working for requested the output in CSV or .dta format. How would I go about working with the results as a dictionary/JSON? I could reshape the results in that format and then use PANDAS to export the data in the format I need at the end. — Jackson Luckey, Aug 21 '17 at 19:41
You want to provide a report. Reports typically aren't writen in csv format. The advantage of dictionaries/json is that you have key value pairs. Those can be used easily with python in [format strings](https://docs.python.org/3/tutorial/inputoutput.html). Or you might event want to construct an [html report](https://stackoverflow.com/questions/12070631/how-to-use-json-file-in-html-code). However, in my opionio csv is simply the wrong tool for this (and I'm also not sure if pandas is the right tool for the reports) — Quickbeam2k1, Aug 21 '17 at 19:50
What would the right tool for a report be? The results are coming from Statsmodels. — Jackson Luckey, Aug 21 '17 at 19:53
Depends on how flexibel you want to be. You could just provide an image (like png). The [summary](http://www.statsmodels.org/dev/generated/statsmodels.iolib.summary.Summary.html) provides .csv, .latex, .html and text export. Does the .csv export suffice for you? Probably the csv export is intended to be viewed from excel (this is good to continue working with the data, but I would dislike csv as a report). I'd go with latex (due to being a mathemtician) but html seems good to. Use some CSS and you have a nicely formatted report. — Quickbeam2k1, Aug 21 '17 at 20:00
Thanks for being so helpful! Sadly, the CSV export does not quite suffice for me in this case as my boss requested to have each comma separated value end up in a separate row/column. Is there a better way to approach that then trying to write a script to parse Statsmodel's CSV output and build a new CSV file from the fragments? I believe he wants the report formatted that way so he can compare a huge number of regressions easily. — Jackson Luckey, Aug 21 '17 at 22:12
I fear there is not. Another option might be to write him a tool to compare several reports quickly, but he'll probably don't see a reason why he should change his habits (and additionally it seems to work for now). Or you try to understand what the e.g. add_extra_txt method in the [summary class](http://www.statsmodels.org/dev/generated/statsmodels.iolib.summary.Summary.html). Maybe this could help you and speed up the process a bit — Quickbeam2k1, Aug 22 '17 at 06:29

Creating a CSV file from regressions using Statsmodels and Pandas

0 Answers0