5

I am trying to use KNN to correctly classify .wav files into two groups, group 0 and group 1.

I extracted the data, created the model, fit the model, however when I try and use the .predict() method I get the following error:

Traceback (most recent call last):   
File "/..../....../KNN.py", line 20, in <module>
    classifier.fit(X_train, y_train)   
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/neighbors/base.py", line 761, in fit
    X, y = check_X_y(X, y, "csr", multi_output=True)   
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 521, in check_X_y
    ensure_min_features, warn_on_dtype, estimator)   
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 405, in check_array
    % (array.ndim, estimator_name)) 
ValueError: Found array with dim 3. Estimator expected <= 2.

I have found these two stackoverflow posts which describe similar issues:

sklearn Logistic Regression "ValueError: Found array with dim 3. Estimator expected <= 2."

Error: Found array with dim 3. Estimator expected <= 2

And, correct me if I'm wrong, but it appears that scikit-learn can only accept 2-dimensional data.

My training data has shape (3240, 20, 5255) Which consists of:

  • 3240 .wav files in this dataset (this is index 0 of the training data) For
  • For each .wav file there is a (20, 5255) numpy array which represents the MFCC coefficients (MFCC coefficients try and represent the sound in a numeric way).

My testing data has shape (3240,) #category is 0 or 1

What code can I use to manipulated my training and testing data to convert it into a form that is usable by scikit-learn? Also, how can I ensure that data is not lost when I go down from 3 dimensions to 2 dimensions?

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
Sreehari R
  • 919
  • 4
  • 11
  • 21
  • 1
    You need to convert the MFCC array to a single dimension and then the shape will be `(3240, 20*5255)` – Vivek Kumar Dec 28 '17 at 07:34
  • Or maybe explain more about the inned 2-d array of shape (20, 5255). What does the rows and columns represent. Can you take a single representative number from each column or row? – Vivek Kumar Dec 28 '17 at 07:35

1 Answers1

7

It is true, sklearn works only with 2D data.

What you can try to do:

  • Just use np.reshape on the training data to convert it to shape (3240, 20*5255). It will preserve all the original information. But sklearn will not be able to exploit the implicit structure in this data (e.g. that features 1, 21, 41, etc. are different versions of the same variable).
  • Build a convolutional neural network on your original data (e.g. with tensorflow+Keras stack). CNNs were designed specially to handle such multidimensional data and exploit its structure. But they have lots of hyperparameters to tune.
  • Use dimensionality reduction (e.g. PCA) on the data reshaped to (3240, 20*5255). It fill try to preserve as much information as possible, while still keeping number of features low.
  • Use manual feature engineering to extract specific information from the data structure (e.g. descriptive statistics along each dimension), and train your model on such features.

If you had more data (e.g. 100K examples), the first approach might work best. In your case (3K examples and 10K features) you need to regularize your model heavily to avoid overfitting.

David Dale
  • 10,958
  • 44
  • 73
  • Do you recommend any specific libraries to perform PCA? Also, after using PCA can I plug the data into scikit learn models and see an improvement on accuracy? – Sreehari R Dec 28 '17 at 15:58
  • Scikit-learn has a good implementation of PCA - http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html – David Dale Dec 28 '17 at 16:02
  • 1
    PCA helps to decrease number of features. With linear models (like logistic regression), it will DECREASE accuracy on training data, but MAYBE increase accuracy on test data, you should test it. I would recommend to start with the first option: fit the model on the reshaped data, and compare accuracy on train and test datasets. If the difference is large, then overfitting has occured, and techniques like PCA might be required. – David Dale Dec 28 '17 at 16:06
  • @DavidDale, hello! Can you help me please? After reshaping on my training data, i got "cannot reshape array of size 22609920 into shape (115,65536)". I have 115 images as my training data set and their size is 256x256 – hyper-cookie Jun 14 '21 at 17:49
  • 1
    @hyper-cookie your images are 3-dimensional because they are in RGB format. You can either increase the vector dimension x3 (196608 instead of 65536) or make the images grayscale by averaging along the channel dimension. Another viable option is to use a pretrained image-2-vector model, such as https://github.com/christiansafka/img2vec or https://huggingface.co/openai/clip-vit-base-patch32 – David Dale Jun 15 '21 at 11:27
  • @DavidDale, hello! Thank you for your answer, but i've got another problem there. Can you please, please look at my last and first question in my profile? I am stuck on this and i don't see any ways to solve it. – hyper-cookie Jun 15 '21 at 17:05