A Python library for describing statistical models and building design matrices, aimed at bringing the convenience of R “formulas” to Python.
Questions tagged [patsy]
113 questions
29
votes
2 answers
How to persist patsy DesignInfo?
I'm working on an application that is a "predictive-model-as-a-service", structured as follows:
train a model offline
periodically upload model parameters to a "prediction server"
the prediction server takes as input a single observation, and…

exp1orer
- 11,481
- 7
- 38
- 51
17
votes
3 answers
Python: How to evaluate the residuals in StatsModels?
I want to evaluate the residuals: (y-hat y).
I know how to do that:
df = pd.read_csv('myFile', delim_whitespace = True, header = None)
df.columns = ['column1', 'column2']
y, X = ps.dmatrices('column1 ~ column2',data = df, return_type =…

DanielTheRocketMan
- 3,199
- 5
- 36
- 65
8
votes
2 answers
Patsy: New levels in categorical fields in test data
I am trying to use Patsy (with sklearn, pandas) for creating a simple regression model. The R style formula creation is a major draw.
My data contains a field called 'ship_city' which can have any city from India. Since I am partitioning the data…

DaSarfyCode
- 385
- 2
- 5
- 11
7
votes
1 answer
PatsyError: Number of rows mismatch between data argument and column (statsmodels)
I'm working with statsmodels using R-style formulas with the Patsy package and receiving an error I can't make heads or tails of, any tips or tricks would be greatly appreciated.
PatsyError: Number of rows mismatch between data argument and
…

R_Queery
- 497
- 1
- 9
- 19
7
votes
1 answer
using ols from statsmodels.formula.api - how to remove constant term?
I'm following this first example in statsmodels tutorial:
http://statsmodels.sourceforge.net/devel/
How do I specify not to use constant term for linear fit in ols?
# Fit regression model (using the natural log of one of the regressors)
results =…

denfromufa
- 5,610
- 13
- 81
- 138
6
votes
2 answers
Statsmodels: Short way of writing Formula
Logistic regression model using statesmodels:
log_reg = st.logit(formula = 'label ~ pregnant + glucose + bp + insulin + bmi + pedigree + age', data=pima).fit()
is there any short way of writing second part of formula (pregnant + glucose + bp +…

BhushanD
- 101
- 1
- 6
5
votes
1 answer
Using ols function with parameters that contain numbers/spaces
I am having a lot of difficulty using the statsmodels.formula.api function
ols(formula,data).fit().rsquared_adj
due to the nature of the names of my predictors.
The predictors have numbers and spaces etc in them which it clearly doesn't…

Thomas
- 51
- 1
- 3
5
votes
1 answer
python logistic regression (beginner)
I'm working on teaching myself a bit of logistic regression using python. I'm trying to apply the lessons in the walkthrough here to the small dataset in the wikipedia entryhere.
Something doesn't seem quite right. Wikipedia and Excel Solver…

drew_is_good
- 163
- 1
- 11
4
votes
1 answer
How to make a sm.Logit regresiion on patsy matrix?
I wanted to create a Logit plot for a natural cubic spline function with four degrees of freedom for P(wage > 250) but the error occurs for some reason. I don't understand why, because OLS works fine.
Here is the code (it should be fully working…

user9102437
- 600
- 1
- 10
- 24
4
votes
1 answer
How to plot the confidence interval for statsmodels fit?
I wanted to show the confidence interval on the plot which I have made for the cubic spline of the data, but I have no idea how it should be done. From theory, I know that the CI should diverge from the fitted line when we get closer to the edges,…

user9102437
- 600
- 1
- 10
- 24
3
votes
1 answer
Python way to automatically test interaction effects in OLS
In R you can essentially write model='Lottery ~ (Literacy + Wealth + Region)^k' and get every k-way combination of those variables.
statsmodels supports some R style OLS regressions but they don't seem to support the ^k syntax. I have a large…

LMGagne
- 1,636
- 6
- 24
- 47
3
votes
1 answer
Clustered standard errors in statsmodels with categorical variables (Python)
I want to run a regression in statsmodels that uses categorical variables and clustered standard errors.
I have a dataset with columns institution, treatment, year, and enrollment. Treatment is a dummy, institution is a string, and the others are…

tower489
- 45
- 1
- 1
- 5
3
votes
0 answers
Weighted Least Squares in Statsmodels vs. Numpy?
I am trying to replicate the functionality of Statsmodels's weight least squares (WLS) function with Numpy's ordinary least squares (OLS) function (i.e. Numpy refers to OLS as just "least squares").
In other words, I want to compute the WLS in…

Code Doggo
- 2,146
- 6
- 33
- 58
3
votes
3 answers
How do I get the columns that a statsmodels / patsy formula depends on?
Suppose I have a pandas dataframe:
df = pd.DataFrame({'x1': [0, 1, 2, 3, 4],
'x2': [10, 9, 8, 7, 6],
'x3': [.1, .1, .2, 4, 8],
'y': [17, 18, 19, 20, 21]})
Now I fit a statsmodels model…

bwk
- 622
- 6
- 18
3
votes
2 answers
Namespace issues when calling patsy within a function
I am attempting to write a wrapper for the statsmodels formula API (this is a simplified version, the function does more than this):
import statsmodels.formula.api as smf
def wrapper(formula, data, **kwargs):
return smf.logit(formula,…

chriswhite
- 1,370
- 10
- 21