0

I have a pandas DataFrame containing a column called 'X' containing a list of 300 doubles and a column called 'label' when trying to run:

cls = SVC()
cls.fit(miniset.loc[:,'X'],miniset.loc[:,'label'])

I get the error: ValueError: setting an array element with a sequence.

Any idea how to fix it?

Thanks

Head of my DataFrame

  label                                                  X
0      0  [-1.1990741, 0.98229957, -2.7413394, 0.5774205...
1      1  [0.10277234, 1.8292198, -1.8241594, 0.07206603...
2      0  [-0.26603428, 1.8654639, -2.2495375, -0.695124...
3      0  [-1.1662953, 3.0714324, -3.4975948, 0.01011618...
4      0  [-0.13769871, 1.9866339, -1.9885212, -0.830097...
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Tal Baumel
  • 31
  • 6

2 Answers2

0

Your issue is the 'X' column of your DataFrame. To get this to work with SVC (or basically any scikit-learn model), you need to split that column into several columns, one each for every element in your lists.

You can fix that by doing something like this.

The pandas package is not intended to store lists or other collections as values. It is meant to store panel data, hence the name pandas.

brentertainer
  • 2,118
  • 1
  • 6
  • 15
  • Didn't want to corrupt my DataFrame with 300 columns so ended up running `cls.fit([x for x in dataset['X']],dataset.loc[:,'label'])` – Tal Baumel Aug 07 '19 at 13:58
0

You can try:

cls.fit(np.array(miniset.loc[:,'X'].tolist()),miniset.loc[:,'label'])

where tolist() gives you a 2D array (which would be good enough).

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74