I have a Pandas dataframe which is stored as an 'object', but I need to change the dataframe structure to an 'int' as the 'object' dtype will not process in the kmeans() function of numpy library
I have managed to convert each column of the dataframe into an float64,based on this example Pandas: change data type of columns but I can't change the whole thing into anything else.
#create subset of user variables
user.posts = user.posts.astype('int')
user.views = user.views.astype('int')
user.kudos = user.kudos.astype('int')
Y = user[['posts','views','kudos']]
#convert dataframe into float
X.convert_objects(convert_numeric=True).dtypes
Out[205]:
posts float64
views float64
kudos float64
dtype: object
This then causes issues when I try and run
K = range(1,10)
# scipy.cluster.vq.kmeans
KM = [kmeans(X,k) for k in K] # apply kmeans 1 to 10
I get the error
--->KM = [kmeans(X,k) for k in K] # apply kmeans 1 to 10
^
AttributeError: 'DataFrame' object has no attribute 'dtype'
What is the issue kmeans is having with either the K or X dataframe, and how can it be resolved? Thanks