Questions tagged [patsy]

A Python library for describing statistical models and building design matrices, aimed at bringing the convenience of R “formulas” to Python.

113 questions
29
votes
2 answers

How to persist patsy DesignInfo?

I'm working on an application that is a "predictive-model-as-a-service", structured as follows: train a model offline periodically upload model parameters to a "prediction server" the prediction server takes as input a single observation, and…
exp1orer
  • 11,481
  • 7
  • 38
  • 51
17
votes
3 answers

Python: How to evaluate the residuals in StatsModels?

I want to evaluate the residuals: (y-hat y). I know how to do that: df = pd.read_csv('myFile', delim_whitespace = True, header = None) df.columns = ['column1', 'column2'] y, X = ps.dmatrices('column1 ~ column2',data = df, return_type =…
DanielTheRocketMan
  • 3,199
  • 5
  • 36
  • 65
8
votes
2 answers

Patsy: New levels in categorical fields in test data

I am trying to use Patsy (with sklearn, pandas) for creating a simple regression model. The R style formula creation is a major draw. My data contains a field called 'ship_city' which can have any city from India. Since I am partitioning the data…
DaSarfyCode
  • 385
  • 2
  • 5
  • 11
7
votes
1 answer

PatsyError: Number of rows mismatch between data argument and column (statsmodels)

I'm working with statsmodels using R-style formulas with the Patsy package and receiving an error I can't make heads or tails of, any tips or tricks would be greatly appreciated. PatsyError: Number of rows mismatch between data argument and …
R_Queery
  • 497
  • 1
  • 9
  • 19
7
votes
1 answer

using ols from statsmodels.formula.api - how to remove constant term?

I'm following this first example in statsmodels tutorial: http://statsmodels.sourceforge.net/devel/ How do I specify not to use constant term for linear fit in ols? # Fit regression model (using the natural log of one of the regressors) results =…
denfromufa
  • 5,610
  • 13
  • 81
  • 138
6
votes
2 answers

Statsmodels: Short way of writing Formula

Logistic regression model using statesmodels: log_reg = st.logit(formula = 'label ~ pregnant + glucose + bp + insulin + bmi + pedigree + age', data=pima).fit() is there any short way of writing second part of formula (pregnant + glucose + bp +…
BhushanD
  • 101
  • 1
  • 6
5
votes
1 answer

Using ols function with parameters that contain numbers/spaces

I am having a lot of difficulty using the statsmodels.formula.api function ols(formula,data).fit().rsquared_adj due to the nature of the names of my predictors. The predictors have numbers and spaces etc in them which it clearly doesn't…
Thomas
  • 51
  • 1
  • 3
5
votes
1 answer

python logistic regression (beginner)

I'm working on teaching myself a bit of logistic regression using python. I'm trying to apply the lessons in the walkthrough here to the small dataset in the wikipedia entryhere. Something doesn't seem quite right. Wikipedia and Excel Solver…
4
votes
1 answer

How to make a sm.Logit regresiion on patsy matrix?

I wanted to create a Logit plot for a natural cubic spline function with four degrees of freedom for P(wage > 250) but the error occurs for some reason. I don't understand why, because OLS works fine. Here is the code (it should be fully working…
user9102437
  • 600
  • 1
  • 10
  • 24
4
votes
1 answer

How to plot the confidence interval for statsmodels fit?

I wanted to show the confidence interval on the plot which I have made for the cubic spline of the data, but I have no idea how it should be done. From theory, I know that the CI should diverge from the fitted line when we get closer to the edges,…
user9102437
  • 600
  • 1
  • 10
  • 24
3
votes
1 answer

Python way to automatically test interaction effects in OLS

In R you can essentially write model='Lottery ~ (Literacy + Wealth + Region)^k' and get every k-way combination of those variables. statsmodels supports some R style OLS regressions but they don't seem to support the ^k syntax. I have a large…
LMGagne
  • 1,636
  • 6
  • 24
  • 47
3
votes
1 answer

Clustered standard errors in statsmodels with categorical variables (Python)

I want to run a regression in statsmodels that uses categorical variables and clustered standard errors. I have a dataset with columns institution, treatment, year, and enrollment. Treatment is a dummy, institution is a string, and the others are…
tower489
  • 45
  • 1
  • 1
  • 5
3
votes
0 answers

Weighted Least Squares in Statsmodels vs. Numpy?

I am trying to replicate the functionality of Statsmodels's weight least squares (WLS) function with Numpy's ordinary least squares (OLS) function (i.e. Numpy refers to OLS as just "least squares"). In other words, I want to compute the WLS in…
Code Doggo
  • 2,146
  • 6
  • 33
  • 58
3
votes
3 answers

How do I get the columns that a statsmodels / patsy formula depends on?

Suppose I have a pandas dataframe: df = pd.DataFrame({'x1': [0, 1, 2, 3, 4], 'x2': [10, 9, 8, 7, 6], 'x3': [.1, .1, .2, 4, 8], 'y': [17, 18, 19, 20, 21]}) Now I fit a statsmodels model…
bwk
  • 622
  • 6
  • 18
3
votes
2 answers

Namespace issues when calling patsy within a function

I am attempting to write a wrapper for the statsmodels formula API (this is a simplified version, the function does more than this): import statsmodels.formula.api as smf def wrapper(formula, data, **kwargs): return smf.logit(formula,…
chriswhite
  • 1,370
  • 10
  • 21
1
2 3 4 5 6 7 8