Highest Voted 'sklearn-pandas' Questions

98

votes

6 answers

How to one-hot-encode from a pandas column containing a list?

I would like to break down a pandas column consisting of a list of elements into as many columns as there are unique elements i.e. one-hot-encode them (with value 1 representing a given element existing in a row and 0 in the case of absence). For…

asked Jul 25 '17 at 19:53

Melsauce

2,535
2
19
39

42

votes

4 answers

Sklearn plot_tree plot is too small

I have this simple code: clf = tree.DecisionTreeClassifier() clf = clf.fit(X, y) tree.plot_tree(clf.fit(X, y)) plt.show() And the result I get is this graph: How do I make this graph legible? I'm using PyCharm Professional 2019.3 as my IDE.

python graphics sklearn-pandas

asked Dec 22 '19 at 19:22

Artur

614
1
6
9

28

votes

4 answers

sklearn stratified sampling based on a column

I have a fairly large CSV file containing amazon review data which I read into a pandas data frame. I want to split the data 80-20(train-test) but while doing so I want to ensure that the split data is proportionally representing the values of one…

python pandas scikit-learn sklearn-pandas

asked May 03 '16 at 06:56

Azee.

703
1
5
12

26

votes

2 answers

python sklearn multiple linear regression display r-squared

I calculated my multiple linear regression equation and I want to see the adjusted R-squared. I know that the score function allows me to see r-squared, but it is not adjusted. import pandas as pd #import the pandas module import numpy as np df =…

python machine-learning sklearn-pandas

asked Feb 03 '17 at 22:02

jeangelj

4,338
16
54
98

23

votes

3 answers

Using K-means with cosine similarity - Python

I am trying to implement Kmeans algorithm in python which will use cosine distance instead of euclidean distance as distance metric. I understand that using different distance function can be fatal and should done carefully. Using cosine distance…

python scikit-learn k-means cosine-similarity sklearn-pandas

asked Sep 25 '17 at 16:22

ise372

231
1
2
5

18

votes

2 answers

Multivariable/Multiple Linear Regression in Scikit Learn?

I have a dataset (dataTrain.csv & dataTest.csv) in .csv file with this format: Temperature(K),Pressure(ATM),CompressibilityFactor(Z) 273.1,24.675,0.806677258 313.1,24.675,0.888394713 ...,...,... And able to build a regression model and prediction…

python pandas scikit-learn sklearn-pandas

asked Feb 05 '17 at 18:23

Drizzer Silverberg

193
1
1
7

17

votes

4 answers

Scikit K-means clustering performance measure

I'm trying to do a clustering with K-means method but I would like to measure the performance of my clustering. I'm not an expert but I am eager to learn more about clustering. Here is my code : import pandas as pd from sklearn import…

python machine-learning scikit-learn cluster-analysis sklearn-pandas

asked May 04 '17 at 13:55

Viphone Rathikoun

187
1
1
5

17

votes

6 answers

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0

I have applied Logistic Regression on train set after splitting the data set into test and train sets, but I got the above error. I tried to work it out, and when i tried to print my response vector y_train in the console it prints integer values…

python-2.7 scikit-learn logistic-regression sklearn-pandas

asked Nov 10 '16 at 10:06

Amey Kumar Samala

904
1
7
20

17

votes

4 answers

No module named 'pandas' in Pycharm

I read all the topics about, but I cannot solve my problem: Traceback (most recent call last): File "/home/.../.../.../reading_data.py", line 1, in import pandas as pd ImportError: No module named pandas This is my…

python pandas module pycharm sklearn-pandas

asked Jul 14 '16 at 14:03

ElenaPhys

443
2
5
16

16

votes

2 answers

How to normalize the Train and Test data using MinMaxScaler sklearn

So, I have this doubt and have been looking for answers. So the question is when I use, from sklearn import preprocessing min_max_scaler = preprocessing.MinMaxScaler() df =…

python machine-learning scikit-learn normalization sklearn-pandas

asked May 28 '18 at 11:58

Tia

521
2
6
18

16

votes

1 answer

'DataFrame' object has no attribute 'ravel' when transforming target variable?

I was fitting a logistic regression with a subset dataset. After splitting the dataset and fitting the model, I got a error message of the following: /Users/Eddie/anaconda/lib/python3.4/site-packages/sklearn/utils/validation.py:526:…

python numpy logistic-regression sklearn-pandas

asked Feb 17 '18 at 13:07

Edward Lin

609
1
9
16

16

votes

1 answer

use Featureunion in scikit-learn to combine two pandas columns for tfidf

While using this as a model for spam classification, I'd like to add an additional feature of the Subject plus the body. I have all of my features in a pandas dataframe. For example, the subject is df['Subject'], the body is df['body_text'] and the…

pandas scikit-learn sklearn-pandas

asked Jan 10 '16 at 20:11

BLodge

163
1
1
4

15

votes

4 answers

What is the difference between X_test, X_train, y_test, y_train in sklearn?

I'm learning sklearn and I didn't understand very good the difference and why use 4 outputs with the function train_test_split(). In the Documentation, I found some examples but it wasn't sufficient to end my doubts. Does the code use the X_train to…

python machine-learning scikit-learn sklearn-pandas supervised-learning

asked Mar 11 '20 at 12:49

Jancer Lima

744
2
10
19

14

votes

3 answers

Append tfidf to pandas dataframe

I have the following pandas structure: col1 col2 col3 text 1 1 0 meaningful text 5 9 7 trees 7 8 2 text I'd like to vectorise it using a tfidf vectoriser. This, however, returns a parse matrix, which I can actually turn…

python dataframe tf-idf sklearn-pandas

asked Aug 30 '17 at 13:26

lte__

7,175
25
74
131

14

votes

2 answers

How to load Only column names from csv file (Pandas)?

I have a large csv file and don't want to load it fully into my memory, I need to get only column names from this csv file. How to load it clearly?

python-3.x pandas sklearn-pandas

asked May 22 '17 at 09:48

Ivan Shelonik

1,958
5
25
49

Questions tagged [sklearn-pandas]

Resources

How to one-hot-encode from a pandas column containing a list?

Sklearn plot_tree plot is too small

sklearn stratified sampling based on a column

python sklearn multiple linear regression display r-squared

Using K-means with cosine similarity - Python

Multivariable/Multiple Linear Regression in Scikit Learn?

Scikit K-means clustering performance measure

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0.0

No module named 'pandas' in Pycharm

How to normalize the Train and Test data using MinMaxScaler sklearn

'DataFrame' object has no attribute 'ravel' when transforming target variable?

use Featureunion in scikit-learn to combine two pandas columns for tfidf

What is the difference between X_test, X_train, y_test, y_train in sklearn?

Append tfidf to pandas dataframe

How to load Only column names from csv file (Pandas)?