Questions tagged [chi-squared]

Anything related to chi-squared probability distribution or chi-squared statistical test (typically of distribution, independence, or goodness of fit).

In probability theory and statistics, the chi-squared (X²) distribution with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in inferential statistics (for example, in hypothesis testing or in construction of confidence intervals).

See also on Wikipedia:

Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

643 questions
50
votes
7 answers

P-value from Chi sq test statistic in Python

I have computed a test statistic that is distributed as a chi square with 1 degree of freedom, and want to find out what P-value this corresponds to using python. I'm a python and maths/stats newbie so I think what I want here is the probability…
Davy Kavanagh
  • 4,809
  • 9
  • 35
  • 50
39
votes
2 answers

Feature selection using scikit-learn

I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method: SelectKBest(chi2, k=10).fit_transform(A1, A2) Since my dataset consist of negative…
22
votes
4 answers

Plot the equivalent of correlation matrix for factors (categorical data)? And mixed types?

Actually there are 2 questions, one is more advanced than the other. Q1: I am looking for a method that similar to corrplot() but can deal with factors. I originally tried to use chisq.test() then calculate the p-value and Cramer's V as correlation,…
J.D
  • 1,885
  • 4
  • 11
  • 19
18
votes
1 answer

Fisher test error : LDSTP is too small

input NN <- c(359,32);JJ <- c(108,13);NNS <- c(103,15);VBN <- c(95,9);RB <- c(63,11);NNP <- c(56,0);VBG <- c(55,10);IN <- c(38,16);VB <- c(20,10);CD <- c(17,6);CC <- c(11,6);DT <- c(11,4);MD <- c(8,5);PRP4 <- c(8,1);PRP <- c(7,4);FW <- c(5,1);VBD <-…
Choijaeyoung
  • 297
  • 2
  • 3
  • 9
15
votes
2 answers

Chi squared test in Python

I'd like to run a chi-squared test in Python. I've created code to do this, but I don't know if what I'm doing is right, because the scipy docs are quite sparse. Background first: I have two groups of users. My null hypothesis is that there is no…
Richard
  • 62,943
  • 126
  • 334
  • 542
13
votes
1 answer

Sklearn Chi2 For Feature Selection

I'm learning about chi2 for feature selection and came across code like this However, my understanding of chi2 was that higher scores mean that the feature is more independent (and therefore less useful to the model) and so we would be interested in…
RSHAP
  • 2,337
  • 3
  • 28
  • 39
13
votes
2 answers

Chi-squared goodness of fit test in R

I have a vector of observed values and also a vector of values calculated with model: actual <- c(1411,439,214,100,62,38,29,64) expected <- c(1425.3,399.5,201.6,116.9,72.2,46.3,30.4,64.8) Now I'm using the Chi-squared goodness of fit test to see…
AliCivil
  • 2,003
  • 6
  • 28
  • 43
12
votes
1 answer

How to obtain the chi squared value as an output of scipy.optimize.curve_fit?

Is it possible to obtain the value of the chi squared as a direct output of scipy.optimize.curve_fit()? Usually, it is easy to compute it after the fit by squaring the difference between the model and the data, weighting by the uncertainties and…
Stefano
  • 359
  • 1
  • 5
  • 16
10
votes
3 answers

Is there a python equivalent of R's qchisq function?

The R qchisq function converts a p-value and number of degrees of freedom to the corresponding chi-squared value. Is there a Python library that has an equivalent? I've looked around in SciPy without finding anything.
jveldridge
  • 1,155
  • 3
  • 11
  • 21
8
votes
1 answer

scikit learn: desired amount of Best Features (k) not selected

I am trying to select the best features using chi-square (scikit-learn 0.10). From a total of 80 training documents I first extract 227 feature, and from these 227 features I want to select the top 10 ones. my_vectorizer =…
D T
  • 677
  • 12
  • 23
7
votes
2 answers

Chi Square Analysis using for loop in R

I'm trying to do chi square analysis for all combinations of variables in the data and my code is: Data <- esoph[ , 1:3] OldStatistic <- NA for(i in 1:(ncol(Data)-1)){ for(j in (i+1):ncol(Data)){ Statistic <- data.frame("Row"=colnames(Data)[i],…
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
7
votes
1 answer

How SelectKBest (chi2) calculates score?

I am trying to find the most valuable features by applying feature selection methods to my dataset. Im using the SelectKBest function for now. I can generate the score values and sort them as I want, but I don't understand exactly how this score…
6
votes
1 answer

Chi Square Analysis - expected frequencies has a zero element at (0,). error

I am working on the data where I am trying to see the association between two variables and I used Chi-Square analysis in Scipy package in Python. Here is the crosstab result of the two…
Jack Daniel
  • 2,527
  • 3
  • 31
  • 52
6
votes
0 answers

Param not changing for std::chi_squared_distribution

As per the answer to this question I have attempted to change the parameter of a distribution in by using .param(). Below is a toy example where I'm trying to do this. For both a chi-squared and a normal distribution I have a function that…
Richard Redding
  • 327
  • 2
  • 11
6
votes
3 answers

Chi-squared test of independence on all combinations of columns in a dataframe in R

this is my first time posting here and I hope this is all in the right place. I have been using R for basic statistical analysis for some time, but haven't really used it for anything computationally challenging and I'm very much a beginner in the…
YJS
  • 61
  • 1
  • 2
1
2 3
42 43