0

I currently have a list which contains all of my input for an sklearn classifier. Each element in that list is a list of features, where each element represents a song in my dataset.

I need to convert this structure to a 2D numpy array so I can scale my data via sklearn's preprocessing. This is proving to be very difficult.

y = [] all_feats = [] for song in data: mfccs_in_song = song[0] oned_mfccs_in_song = [] for frame in mfccs_in_song: for m in frame: oned_mfccs_in_song.append(m) all_feats.append(oned_mfccs_in_song) label = song[-1] y.append(label)

Long story short, all_feats is that list of lists. It has a length of 600. How can I convert this to a numpy array for preprocessing? I have tried numerous things, including simply all_feats = np.array(all_feats), however that does not work.

ohbrobig
  • 939
  • 2
  • 13
  • 34
  • What is the issue with using `all_feats = np.array(all_feats)`? Does it give an error? What error? – Antimony May 16 '17 at 00:35
  • File "/Library/Python/2.7/site-packages/sklearn/preprocessing/data.py", line 129, in scale dtype=FLOAT_DTYPES) File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 382, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: setting an array element with a sequence. – ohbrobig May 16 '17 at 00:39
  • Take a look at the suggestions/answers given [here](http://stackoverflow.com/questions/4674473/valueerror-setting-an-array-element-with-a-sequence). More specifically, look at the resulting `all_feats`. It may not have sublists of the same size. – Antimony May 16 '17 at 00:40
  • You were right. There was one sublist out of the 600 that wasn't the same size! – ohbrobig May 16 '17 at 00:58
  • Great, I'll add my suggestion as an answer and you can accept it :D – Antimony May 16 '17 at 01:00

1 Answers1

1

That error suggests that all_feats may not have sublists of the same size. Take a look at its contents, and once you figure out what's the right length for the sublists, and how to prune the extra elements out, you can run all_feats = np.array(all_feats) and it should work!

Take a look at the answers in this link for more explanation.

Community
  • 1
  • 1
Antimony
  • 2,230
  • 3
  • 28
  • 38
  • 1
    Yep, fixed it. Simply found the minimum length of all the sublists and resized the bad apple! Python slicing to the rescue. – ohbrobig May 16 '17 at 01:07