Highest Voted 'feature-selection' Questions

143

votes

7 answers

How are feature_importances in RandomForestClassifier determined?

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find out, which attributes/dates contribute to the result…

scikit-learn random-forest feature-selection

asked Apr 04 '13 at 11:53

user2244670

1,431
2
10
3

82

votes

9 answers

The easiest way for getting feature names after running SelectKBest in Scikit Learn

I'm trying to conduct a supervised machine-learning experiment using the SelectKBest feature of scikit-learn, but I'm not sure how to create a new dataframe after finding the best features: Let's assume I would like to conduct the experiment…

python pandas scikit-learn feature-extraction feature-selection

asked Oct 03 '16 at 19:35

Aviade

2,057
4
27
49

77

votes

4 answers

Linear regression analysis with string/categorical features (variables)?

Regression algorithms seem to be working on features represented as numbers. For example: This data set doesn't contain categorical features/variables. It's quite clear how to do regression on this data and predict price. But now I want to do a…

python machine-learning regression linear-regression feature-selection

asked Nov 30 '15 at 20:21

Erba Aitbayev

4,167
12
46
81

75

votes

3 answers

Feature/Variable importance after a PCA analysis

I have performed a PCA analysis over my original dataset and from the compressed dataset transformed by the PCA I have also selected the number of PC I want to keep (they explain almost the 94% of the variance). Now I am struggling with the…

python machine-learning scikit-learn pca feature-selection

asked Jun 11 '18 at 10:49

fbm

753
1
6
5

52

votes

8 answers

Random Forest Feature Importance Chart using Python

I am working with RandomForestRegressor in python and I want to create a chart that will illustrate the ranking of feature importance. This is the code I used: from sklearn.ensemble import RandomForestRegressor MT= pd.read_csv("MT_reduced.csv") df…

python plot random-forest feature-selection

asked May 21 '17 at 20:26

user348547

623
1
6
4

47

votes

2 answers

How is the feature score(/importance) in the XGBoost package calculated?

The command xgb.importance returns a graph of feature importance measured by an f score. What does this f score represent and how is it calculated? Output: Graph of feature importance

python r classification feature-selection xgboost

asked Dec 11 '15 at 07:30

ishido

4,065
9
32
42

43

votes

1 answer

Understanding the `ngram_range` argument in a CountVectorizer in sklearn

I'm a little confused about how to use ngrams in the scikit-learn library in Python, specifically, how the ngram_range argument works in a CountVectorizer. Running this code: from sklearn.feature_extraction.text import CountVectorizer vocabulary…

python scikit-learn n-gram feature-selection

asked Jun 03 '14 at 01:27

tumultous_rooster

12,150
32
92
149

40

votes

2 answers

Correlated features and classification accuracy

I'd like to ask everyone a question about how correlated features (variables) affect the classification accuracy of machine learning algorithms. With correlated features I mean a correlation between them and not with the target class (i.e the…

machine-learning classification correlation feature-selection

asked Feb 11 '13 at 14:18

Titus Pullo

3,751
15
45
65

39

votes

2 answers

Feature selection using scikit-learn

I'm new in machine learning. I'm preparing my data for classification using Scikit Learn SVM. In order to select the best features I have used the following method: SelectKBest(chi2, k=10).fit_transform(A1, A2) Since my dataset consist of negative…

python machine-learning scikit-learn feature-selection chi-squared

asked Sep 11 '14 at 15:53

sara

427
1
6
8

39

votes

3 answers

Understanding max_features parameter in RandomForestRegressor

While constructing each tree in the random forest using bootstrapped samples, for each terminal node, we select m variables at random from p variables to find the best split (p is the total number of features in your data). My questions (for…

machine-learning scikit-learn random-forest feature-selection

asked May 29 '14 at 17:52

csankar69

657
1
6
13

32

votes

3 answers

Information Gain calculation with Scikit-learn

I am using Scikit-learn for text classification. I want to calculate the Information Gain for each attribute with respect to a class in a (sparse) document-term matrix. the Information Gain is defined as H(Class) - H(Class | Attribute), where H is…

python machine-learning scikit-learn text-classification feature-selection

asked Oct 15 '17 at 07:17

Roman Purgstaller

910
3
11
24

32

votes

2 answers

TypeError: only integer arrays with one element can be converted to an index

I'm getting the following error when performing recursive feature selection with cross-validation: Traceback (most recent call last): File "/Users/.../srl/main.py", line 32, in argident_sys.train_classifier() File…

python scikit-learn feature-selection

asked Sep 16 '12 at 03:50

feralvam

1,603
2
17
20

30

votes

8 answers

Scikit-Learn Linear Regression how to get coefficient's respective features?

I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get the respective features, as only coefficients are…

scikit-learn linear-regression feature-selection

asked Nov 15 '14 at 23:14

jeffrey

3,196
7
26
44

29

votes

2 answers

scikit learn - feature importance calculation in decision trees

I'm trying to understand how feature importance is calculated for decision trees in sci-kit learn. This question has been asked before, but I am unable to reproduce the results the algorithm is providing. For example: from StringIO import…

python scikit-learn decision-tree feature-selection

asked Mar 08 '18 at 10:00

Roman Purgstaller

910
3
11
24

28

votes

2 answers

Should Feature Selection be done before Train-Test Split or after?

Actually, there is a contradiction of 2 facts that are the possible answers to the question: The conventional answer is to do it after splitting as there can be information leakage, if done before, from the Test-Set. The contradicting answer is…

machine-learning feature-selection train-test-split

asked May 25 '19 at 19:38

Navoneel Chakrabarty

389
1
3
3

Questions tagged [feature-selection]