Questions tagged [statsmodels]

Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.

Homepage: http://www.statsmodels.org/

An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. Features include:

  • Linear regression models
  • Generalized linear models
  • Discrete choice models
  • Robust linear models
  • Many models and functions for time series analysis
  • Nonparametric estimators
  • A collection of datasets for examples
  • A wide range of statistical tests
  • Input-output tools for producing tables in a number of formats (Text, LaTex, HTML) and for reading Stata files into NumPy and Pandas.
  • Plotting functions
  • Extensive unit tests to ensure correctness of results
  • Many more models and extensions in development
2841 questions
297
votes
11 answers

How to iterate over columns of pandas dataframe to run regression

I have this code using Pandas in Python: all_data = {} for ticker in ['FIUIX', 'FSAIX', 'FSAVX', 'FSTMX']: all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2010', '1/1/2015') prices = DataFrame({tic: data['Adj Close'] for tic, data in…
itzy
  • 11,275
  • 15
  • 63
  • 96
133
votes
6 answers

Run an OLS regression with Pandas Data Frame

I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example: import pandas as pd df = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40,…
Michael
  • 13,244
  • 23
  • 67
  • 115
116
votes
7 answers

Weighted standard deviation in NumPy

numpy.average() has a weights option, but numpy.std() does not. Does anyone have suggestions for a workaround?
YGA
  • 9,546
  • 15
  • 47
  • 50
89
votes
13 answers

ValueError: numpy.dtype has the wrong size, try recompiling

I just installed pandas and statsmodels package on my python 2.7 When I tried "import pandas as pd", this error message comes out. Can anyone help? Thanks!!! numpy.dtype has the wrong size, try recompiling Traceback (most recent call last): File…
Amber Chen
  • 993
  • 1
  • 6
  • 7
88
votes
10 answers

auto.arima() equivalent for python

I am trying to predict weekly sales using ARMA ARIMA models. I could not find a function for tuning the order(p,d,q) in statsmodels. Currently R has a function forecast::auto.arima() which will tune the (p,d,q) parameters. How do I go about…
Ajax
  • 1,689
  • 4
  • 20
  • 29
64
votes
5 answers

Pythonic way of detecting outliers in one dimensional observation data

For the given data, I want to set the outlier values (defined by 95% confidense level or 95% quantile function or anything that is required) as nan values. Following is the my data and code that I am using right now. I would be glad if someone could…
user3410943
62
votes
9 answers

Variance Inflation Factor in Python

I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python: a b c d 1 2 4 4 1 2 6 3 2 3 7 4 3 2 8 5 4 1 9 4 I have already done this in R using the vif function from the usdm library which gives the…
Nizag
  • 909
  • 1
  • 9
  • 15
61
votes
7 answers

confidence and prediction intervals with StatsModels

I do this linear regression with StatsModels: import numpy as np import statsmodels.api as sm from statsmodels.sandbox.regression.predstd import wls_prediction_std n = 100 x = np.linspace(0, 10, n) e = np.random.normal(size=n) y = 1 + 0.5*x +…
F.N.B
  • 1,539
  • 6
  • 23
  • 39
54
votes
6 answers

Why do I get only one parameter from a statsmodels OLS fit

Here is what I am doing: $ python Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin >>> import statsmodels.api as sm >>> statsmodels.__version__ '0.5.0' >>> import numpy >>> y =…
Tom
  • 2,769
  • 2
  • 17
  • 22
46
votes
5 answers

Print 'std err' value from statsmodels OLS results

(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can't access the docs) I'm doing a linear regression using statsmodels, basically: import statsmodels.api as sm model = sm.OLS(y,x) results = model.fit() I know that I…
Gabriel
  • 40,504
  • 73
  • 230
  • 404
45
votes
4 answers

Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`

I have pandas dataframe with some categorical predictors (i.e. variables) as 0 & 1, and some numeric variables. When I fit that to a stasmodel like: est = sm.OLS(y, X).fit() It throws: Pandas data cast to numpy dtype of object. Check input data…
Sanoj
  • 1,347
  • 3
  • 15
  • 21
43
votes
10 answers

Where can I find mad (mean absolute deviation) in scipy?

It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers: http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473 However, I can not find it anywhere in current versions of…
Ton van den Heuvel
  • 10,157
  • 6
  • 43
  • 82
43
votes
5 answers

How to extract the regression coefficient from statsmodels.api?

result = sm.OLS(gold_lookback, silver_lookback ).fit() After I get the result, how can I get the coefficient and the constant? In other words, if y = ax + c how to get the values a and c?
JOHN
  • 1,411
  • 3
  • 21
  • 41
42
votes
3 answers

What's the difference between pandas ACF and statsmodel ACF?

I'm calculating the Autocorrelation Function for a stock's returns. To do so I tested two functions, the autocorr function built into Pandas, and the acf function supplied by statsmodels.tsa. This is done in the following MWE: import pandas as…
BML91
  • 2,952
  • 3
  • 32
  • 54
41
votes
7 answers

Highest Posterior Density Region and Central Credible Region

Given a posterior p(Θ|D) over some parameters Θ, one can define the following: Highest Posterior Density Region: The Highest Posterior Density Region is the set of most probable values of Θ that, in total, constitute 100(1-α) % of the posterior…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
1
2 3
99 100