Questions tagged [scikit-learn]

Scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining, with a focus on machine learning. It is accessible to everybody and reusable in various contexts. It is built on NumPy and SciPy. The project is open source and commercially usable (BSD license).

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

Resources

Related Libraries

sklearn-pandas - bridge library between scikit-learn and pandas
scikit-image - scikit-learn-compatible API for image processing and computer vision for machine learning tasks
sklearn laboratory - scikit-learn wrapper that enables running larger scikit-learn experiments and feature sets
sklearn deap - scikit-learn wrapper that enables hyper parameter tuning using evolutionary algorithms instead of gridsearch in scikit-learn
hyperopt-sklearn - Hyper-parameter optimization for sklearn
scikit-plot - visualization library for quickly generating common plots in machine learning studies
sklearn-porter - library for turning trained scikit-learn models into compiled c, java, or javascript code
sklearn_theano - scikit-learn-compatible objects (estimators, transformers, and datasets) using theano internally
sparkit-learn - scikit-learn API that uses pyspark's distributed computing model
joblib - scikit-learn parallelization library

28024 questions

320

votes

25 answers

Label encoding across multiple columns in scikit-learn

I'm trying to use scikit-learn's LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder object for each column; I'd rather just have one big LabelEncoder objects…

python pandas scikit-learn

asked Jun 27 '14 at 18:29

Bryan

5,999
9
29
50

310

votes

15 answers

How to normalize a numpy array to a unit vector

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function: def normalize(v): norm = np.linalg.norm(v) if norm == 0: return v return v /…

python numpy scikit-learn statistics normalization

asked Jan 09 '14 at 20:25

Donbeo

17,067
37
114
188

260

votes

27 answers

sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error. ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). I have…

python python-2.7 scikit-learn valueerror

asked Jul 09 '15 at 16:40

Ethan Waldie

2,799
2
12
14

260

votes

14 answers

Is there a library function for Root mean square error (RMSE) in python?

I know I could implement a root mean squared error function like this: def rmse(predictions, targets): return np.sqrt(((predictions - targets) ** 2).mean()) What I'm looking for if this rmse function is implemented in a library somewhere,…

python scikit-learn scipy

asked Jun 19 '13 at 17:24

siamii

23,374
28
93
143

257

votes

15 answers

ImportError: No module named sklearn.cross_validation

I am using python 2.7 in Ubuntu 14.04. I installed scikit-learn, numpy and matplotlib with these commands: sudo apt-get install build-essential python-dev python-numpy \ python-numpy-dev python-scipy libatlas-dev g++ python-matplotlib…

python scikit-learn

asked Jun 05 '15 at 13:15

arthurckl

5,281
6
17
16

254

votes

6 answers

Save classifier to disk in scikit-learn

How do I save a trained Naive Bayes classifier to disk and use it to predict data? I have the following sample program from the scikit-learn website: from sklearn import datasets iris = datasets.load_iris() from sklearn.naive_bayes import…

python machine-learning scikit-learn classification

asked May 15 '12 at 00:06

garak

4,713
9
39
56

248

votes

13 answers

How to split data into 3 sets (train, validation and test)?

I have a pandas dataframe and I wish to divide it to 3 separate sets. I know that using train_test_split from sklearn.cross_validation, one can divide the data in two sets (train and test). However, I couldn't find any solution about splitting the…

pandas numpy dataframe machine-learning scikit-learn

asked Jul 07 '16 at 16:26

CentAu

10,660
15
59
85

241

votes

10 answers

Find p-value (significance) in scikit-learn LinearRegression

How can I find the p-value (significance) of each coefficient? lm = sklearn.linear_model.LinearRegression() lm.fit(x,y)

python scikit-learn statistics regression hypothesis-test

asked Jan 13 '15 at 17:46

elplatt

3,227
3
18
20

235

votes

9 answers

pandas dataframe columns scaling with sklearn

I have a pandas dataframe with mixed type columns, and I'd like to apply sklearn's min_max_scaler to some of the columns. Ideally, I'd like to do these transformations in place, but haven't figured out a way to do that yet. I've written the…

python pandas scikit-learn dataframe

asked Jul 09 '14 at 03:57

flyingmeatball

7,457
7
44
62

228

votes

10 answers

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

python machine-learning cluster-analysis k-means scikit-learn

asked Apr 03 '11 at 12:39

bmasc

2,410
2
15
9

228

votes

9 answers

A column-vector y was passed when a 1d array was expected

I need to fit RandomForestRegressor from sklearn.ensemble. forest = ensemble.RandomForestRegressor(**RF_tuned_parameters) model = forest.fit(train_fold, train_y) yhat = model.predict(test_fold) This code always worked until I made some…

python pandas numpy scikit-learn

asked Dec 08 '15 at 20:47

Klausos Klausos

15,308
51
135
217

222

votes

19 answers

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Importing from pyxdameraulevenshtein gives the following error, I have pyxdameraulevenshtein==1.5.3 pandas==1.1.4 scikit-learn==0.20.2. Numpy is 1.16.1. Works well in Python 3.6, Issue in Python 3.7. Has anyone been facing similar issues with…

python pandas numpy scikit-learn python-3.7

asked Feb 05 '21 at 09:11

Sachit Jani

2,321
2
4
5

207

votes

25 answers

How to extract the decision rules from scikit-learn decision-tree?

Can I extract the underlying decision-rules (or 'decision paths') from a trained tree in a decision tree as a textual list? Something like: if A>0.4 then if B<0.2 then if C>0.8 then class='X'

python machine-learning scikit-learn decision-tree random-forest

asked Nov 26 '13 at 17:58

Dror Hilman

6,837
9
39
56

202

votes

8 answers

Random state (Pseudo-random number) in Scikit learn

I want to implement a machine learning algorithm in scikit learn, but I don't understand what this parameter random_state does? Why should I use it? I also could not understand what is a Pseudo-random number.

python random scikit-learn

asked Jan 21 '15 at 10:17

Elizabeth Susan Joseph

6,255
7
20
23

198

votes

15 answers

ModuleNotFoundError: No module named 'sklearn'

I want to import sklearn but there is no module apparently: ModuleNotFoundError: No module named 'sklearn' I am using Anaconda and Python 3.6.1; I have checked everywhere but still can't find answers. When I use the command: conda install…

python scikit-learn anaconda package conda

asked Sep 08 '17 at 09:56

Hareez Rana

1,993
2
7
3

2 3

…

99 100 Next