How to convert a Python list of lists to a 2D numpy array for sklearn.preprocessing

Question

I currently have a list which contains all of my input for an sklearn classifier. Each element in that list is a list of features, where each element represents a song in my dataset.

I need to convert this structure to a 2D numpy array so I can scale my data via sklearn's preprocessing. This is proving to be very difficult.

y = [] all_feats = [] for song in data: mfccs_in_song = song[0] oned_mfccs_in_song = [] for frame in mfccs_in_song: for m in frame: oned_mfccs_in_song.append(m) all_feats.append(oned_mfccs_in_song) label = song[-1] y.append(label)

Long story short, all_feats is that list of lists. It has a length of 600. How can I convert this to a numpy array for preprocessing? I have tried numerous things, including simply all_feats = np.array(all_feats), however that does not work.

What is the issue with using `all_feats = np.array(all_feats)`? Does it give an error? What error? — Antimony, May 16 '17 at 00:35
File "/Library/Python/2.7/site-packages/sklearn/preprocessing/data.py", line 129, in scale dtype=FLOAT_DTYPES) File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", line 382, in check_array array = np.array(array, dtype=dtype, order=order, copy=copy) ValueError: setting an array element with a sequence. — ohbrobig, May 16 '17 at 00:39
Take a look at the suggestions/answers given [here](http://stackoverflow.com/questions/4674473/valueerror-setting-an-array-element-with-a-sequence). More specifically, look at the resulting `all_feats`. It may not have sublists of the same size. — Antimony, May 16 '17 at 00:40
You were right. There was one sublist out of the 600 that wasn't the same size! — ohbrobig, May 16 '17 at 00:58
Great, I'll add my suggestion as an answer and you can accept it :D — Antimony, May 16 '17 at 01:00

score 1 · Accepted Answer · edited May 23 '17 at 12:03

1

That error suggests that all_feats may not have sublists of the same size. Take a look at its contents, and once you figure out what's the right length for the sublists, and how to prune the extra elements out, you can run all_feats = np.array(all_feats) and it should work!

Take a look at the answers in this link for more explanation.

edited May 23 '17 at 12:03

Community

1
1

answered May 16 '17 at 01:02

Antimony

2,230
3
28
38

1

Yep, fixed it. Simply found the minimum length of all the sublists and resized the bad apple! Python slicing to the rescue. – ohbrobig May 16 '17 at 01:07

How to convert a Python list of lists to a 2D numpy array for sklearn.preprocessing

1 Answers1