1

I am testing this code.

# Import the necessary packages
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Normalizer
from sklearn.cluster import KMeans
# Define a normalizer
normalizer = Normalizer()
# Create Kmeans model
kmeans = KMeans(n_clusters = 10,max_iter = 1000)
# Make a pipeline chaining normalizer and kmeans
pipeline = make_pipeline(normalizer,kmeans)
# Fit pipeline to daily stock movements
pipeline.fit(score)
labels = pipeline.predict(score)

This line throws an error:

pipeline.fit(score)

Here is the error that I see:

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I don't know what this error means. I Googled it and didn't find anything useful. Here is a small sample of my data:

array=[1. 1. 1. ... 8. 1. 1.].

I am following the example from the link below.

https://medium.com/datadriveninvestor/stock-market-clustering-with-k-means-clustering-in-python-4bf6bd5bd685

When I run the code from the link, everything works fine. I'm not sure why it falls down when I run the code on my own data, which is just:

1, 1.9, 2.62, 3.5, 4.1, 7.7, 9.75, etc, etc.  

It goes from 1-10. That's all it is.

ASH
  • 20,759
  • 19
  • 87
  • 200
  • 1
    Just reshape it like it says. numpy needs two defined dimensions for some processes. You can check the shape with `array.shape`. Yours is probably (n,), but it needs to be (n,1). Try `array = array.reshape(-1, 1)` – Mark Moretto Jan 03 '20 at 01:30
  • Yeah, that works, but what was the actual issue? I haven't seen that before. – ASH Jan 03 '20 at 02:03
  • I think it has to do with defined dimensionality. The -1 in reshape acts as a 'wildcard' and tells `numpy` to figure it out. Since matrices are (row, column) format, we told reshape to ensure that there is 1 column and an unknown number of rows. Now, if you have `array=np.array([1., 1., 1., 8., 1., 1.,])`, which is 6 elements long, and then run `array.reshape(-1, 2)`, `numpy` will automatically output a matrix with the dimensions (3, 2). Our column requirement (2) is met and `numpy` figured out the rest. Note: You cannot run `array.reshape(-1, -1)` since one dimension must be known. – Mark Moretto Jan 03 '20 at 13:06
  • Does this answer your question? [Preprocessing in scikit learn - single sample - Depreciation warning](https://stackoverflow.com/questions/35082140/preprocessing-in-scikit-learn-single-sample-depreciation-warning) – AMC Feb 08 '20 at 01:14

2 Answers2

1

Any sklearn.Transformer expects a [sample size, n_features] sized array. So there's two scenarios you will have to reshape your data,

  • If you only have a single sample, you need to reshape it to [1, n_features] sized array
  • If you have only a single feature, you need to reshape it to [sample size, 1] sized array

So you need to do what suits the problem. You are passing a 1D vector.

[1. 1. 1. ... 8. 1. 1.]

If this is a single sample, reshape it to (1, -1) sized array and you will be fine. But with that said you might want to think about the following.

  • If this is a single sample, there's no point in fitting a model with a single sample. You won't get any benefit.
  • If this is a set of samples with a single feature, I don't really see a benefit in doing K-means on such a dataset.
thushv89
  • 10,865
  • 1
  • 26
  • 39
0

The problem may be with the format of your data. Most models will expect a data frame

Mayowa Ayodele
  • 549
  • 2
  • 11