Questions tagged [scikit-learn]

Scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining, with a focus on machine learning. It is accessible to everybody and reusable in various contexts. It is built on NumPy and SciPy. The project is open source and commercially usable (BSD license).

scikit-learn is a machine-learning library for Python that provides simple and efficient tools for data analysis and data mining. It is built on NumPy, SciPy, and matplotlib. The project is open source and commercially usable (BSD license).

Resources

Related Libraries

  • sklearn-pandas - bridge library between scikit-learn and
  • scikit-image - scikit-learn-compatible API for image processing and computer vision for machine learning tasks
  • sklearn laboratory - scikit-learn wrapper that enables running larger scikit-learn experiments and feature sets
  • sklearn deap - scikit-learn wrapper that enables hyper parameter tuning using evolutionary algorithms instead of gridsearch in scikit-learn
  • hyperopt-sklearn - Hyper-parameter optimization for sklearn
  • scikit-plot - visualization library for quickly generating common plots in machine learning studies
  • sklearn-porter - library for turning trained scikit-learn models into compiled , , or code
  • sklearn_theano - scikit-learn-compatible objects (estimators, transformers, and datasets) using internally
  • sparkit-learn - scikit-learn API that uses 's distributed computing model
  • joblib - scikit-learn parallelization library
28024 questions
320
votes
25 answers

Label encoding across multiple columns in scikit-learn

I'm trying to use scikit-learn's LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder object for each column; I'd rather just have one big LabelEncoder objects…
Bryan
  • 5,999
  • 9
  • 29
  • 50
310
votes
15 answers

How to normalize a numpy array to a unit vector

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function: def normalize(v): norm = np.linalg.norm(v) if norm == 0: return v return v /…
Donbeo
  • 17,067
  • 37
  • 114
  • 188
260
votes
27 answers

sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error. ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). I have…
Ethan Waldie
  • 2,799
  • 2
  • 12
  • 14
260
votes
14 answers

Is there a library function for Root mean square error (RMSE) in python?

I know I could implement a root mean squared error function like this: def rmse(predictions, targets): return np.sqrt(((predictions - targets) ** 2).mean()) What I'm looking for if this rmse function is implemented in a library somewhere,…
siamii
  • 23,374
  • 28
  • 93
  • 143
257
votes
15 answers

ImportError: No module named sklearn.cross_validation

I am using python 2.7 in Ubuntu 14.04. I installed scikit-learn, numpy and matplotlib with these commands: sudo apt-get install build-essential python-dev python-numpy \ python-numpy-dev python-scipy libatlas-dev g++ python-matplotlib…
arthurckl
  • 5,281
  • 6
  • 17
  • 16
254
votes
6 answers

Save classifier to disk in scikit-learn

How do I save a trained Naive Bayes classifier to disk and use it to predict data? I have the following sample program from the scikit-learn website: from sklearn import datasets iris = datasets.load_iris() from sklearn.naive_bayes import…
garak
  • 4,713
  • 9
  • 39
  • 56
248
votes
13 answers

How to split data into 3 sets (train, validation and test)?

I have a pandas dataframe and I wish to divide it to 3 separate sets. I know that using train_test_split from sklearn.cross_validation, one can divide the data in two sets (train and test). However, I couldn't find any solution about splitting the…
CentAu
  • 10,660
  • 15
  • 59
  • 85
241
votes
10 answers

Find p-value (significance) in scikit-learn LinearRegression

How can I find the p-value (significance) of each coefficient? lm = sklearn.linear_model.LinearRegression() lm.fit(x,y)
elplatt
  • 3,227
  • 3
  • 18
  • 20
235
votes
9 answers

pandas dataframe columns scaling with sklearn

I have a pandas dataframe with mixed type columns, and I'd like to apply sklearn's min_max_scaler to some of the columns. Ideally, I'd like to do these transformations in place, but haven't figured out a way to do that yet. I've written the…
flyingmeatball
  • 7,457
  • 7
  • 44
  • 62
228
votes
10 answers

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?
bmasc
  • 2,410
  • 2
  • 15
  • 9
228
votes
9 answers

A column-vector y was passed when a 1d array was expected

I need to fit RandomForestRegressor from sklearn.ensemble. forest = ensemble.RandomForestRegressor(**RF_tuned_parameters) model = forest.fit(train_fold, train_y) yhat = model.predict(test_fold) This code always worked until I made some…
Klausos Klausos
  • 15,308
  • 51
  • 135
  • 217
222
votes
19 answers

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Importing from pyxdameraulevenshtein gives the following error, I have pyxdameraulevenshtein==1.5.3 pandas==1.1.4 scikit-learn==0.20.2. Numpy is 1.16.1. Works well in Python 3.6, Issue in Python 3.7. Has anyone been facing similar issues with…
Sachit Jani
  • 2,321
  • 2
  • 4
  • 5
207
votes
25 answers

How to extract the decision rules from scikit-learn decision-tree?

Can I extract the underlying decision-rules (or 'decision paths') from a trained tree in a decision tree as a textual list? Something like: if A>0.4 then if B<0.2 then if C>0.8 then class='X'
202
votes
8 answers

Random state (Pseudo-random number) in Scikit learn

I want to implement a machine learning algorithm in scikit learn, but I don't understand what this parameter random_state does? Why should I use it? I also could not understand what is a Pseudo-random number.
Elizabeth Susan Joseph
  • 6,255
  • 7
  • 20
  • 23
198
votes
15 answers

ModuleNotFoundError: No module named 'sklearn'

I want to import sklearn but there is no module apparently: ModuleNotFoundError: No module named 'sklearn' I am using Anaconda and Python 3.6.1; I have checked everywhere but still can't find answers. When I use the command: conda install…
Hareez Rana
  • 1,993
  • 2
  • 7
  • 3
1
2 3
99 100