Convert Pandas dtype of dataframe

Question

I have a Pandas dataframe which is stored as an 'object', but I need to change the dataframe structure to an 'int' as the 'object' dtype will not process in the kmeans() function of numpy library

I have managed to convert each column of the dataframe into an float64,based on this example Pandas: change data type of columns but I can't change the whole thing into anything else.

 #create subset of user variables
 user.posts = user.posts.astype('int')
 user.views = user.views.astype('int')
 user.kudos = user.kudos.astype('int')

 Y = user[['posts','views','kudos']]
 #convert dataframe into float
 X.convert_objects(convert_numeric=True).dtypes

Out[205]:
 posts    float64
 views    float64
 kudos    float64
 dtype: object

This then causes issues when I try and run

K = range(1,10)

# scipy.cluster.vq.kmeans
KM = [kmeans(X,k) for k in K] # apply kmeans 1 to 10

I get the error

  --->KM = [kmeans(X,k) for k in K] # apply kmeans 1 to 10
  ^

  AttributeError: 'DataFrame' object has no attribute 'dtype'

What is the issue kmeans is having with either the K or X dataframe, and how can it be resolved? Thanks

Answered! user.posts = user.posts.astype('float') user.views = user.views.astype('float') user.kudos = user.kudos.astype('float') Y = user[['posts','views','kudos']].values — conr404, Mar 02 '15 at 19:36

score 4 · Accepted Answer · edited May 23 '17 at 12:00

4

save it just as the values, not the objects. per this post How to convert a pandas DataFrame subset of columns AND rows into a numpy array?

user.posts = user.posts.astype('float')
user.views = user.views.astype('float')
user.kudos = user.kudos.astype('float')

Y = user[['posts','views','kudos']].values

edited May 23 '17 at 12:00

Community

1
1

answered Mar 02 '15 at 19:37

conr404

305
2
4
19

Convert Pandas dtype of dataframe

1 Answers1