How to split a (569 ,31 ) DataFrame into two with shapes (569 ,30) and (569, )

Question

How to split a (569 ,31 ) dataframe into two with shapes (569 ,30) and (569, )

The dataFrame has 31 columns-

df.columns yields this -

Index([u'mean radius', u'mean texture', u'mean perimeter', u'mean area',
       u'mean smoothness', u'mean compactness', u'mean concavity',
       u'mean concave points', u'mean symmetry', u'mean fractal dimension',
       u'radius error', u'texture error', u'perimeter error', u'area error',
       u'smoothness error', u'compactness error', u'concavity error',
       u'concave points error', u'symmetry error', u'fractal dimension error',
       u'worst radius', u'worst texture', u'worst perimeter', u'worst area',
       u'worst smoothness', u'worst compactness', u'worst concavity',
       u'worst concave points', u'worst symmetry', u'worst fractal dimension',
       u'target'],
      dtype='object')

I need to split it into two. I did something like this -

X = df.ix[:,'mean radius': 'worst fractal dimension']

y = df.ix[:,'target': ]

X.shape gives (569, 30) which is as expected, but y.shape gives (569,1). I dont really understand the difference between (569,) ans (569, 1). BUt he answer required is shape of (569,)

score 2 · Answer 1 · answered Jul 01 '17 at 07:10

y.shape gives you (569, 1) because calling y = df.ix[:,'target': ] returns you a DataFrame type.

Difference between shapes (569,) and (569, 1) is that (569,) is a Series type and it has only one dimension, while (569, 1) is a DataFrame with two dimensions ('569' - for 569 rows and '1' for 1 column).

Calling y = df['target'] should return you a Series type.

Also, note, that the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers:
http://pandas.pydata.org/ Nevertheless, it still works

You can also convert 'one-column' DataFrame into Series manually as discussed for example here

To check the type of your variable you can find type(y) very usefull and it helps solve similar issues

score 1 · Accepted Answer · answered Jun 10 '17 at 07:51

1

X = df[df.columns.drop('target')]
y = df['target']

alternatively you can change:

y = df.ix[:,'target': ]

to:

y = df.ix[:,'target']

PS .ix[] indexer is deprecated in modern Pandas versions, so it's advised to use .loc[] instead

answered Jun 10 '17 at 07:51

MaxU - stand with Ukraine

205,989
36
386
419

How to split a (569 ,31 ) DataFrame into two with shapes (569 ,30) and (569, )

2 Answers2