The Scenario
The dataframe reads a csv seperated by commas containing 3 cols, out of which 2nd and 3rd columns values needs to be taken in list and to be stored in array. This format is mandatory since the sklearn.cluster.KMeans takes only in this format.
The Problem
I've already written a function to generate array and then pass it in a list, which will then be added in array subsequently.
CODE:
def gen_array():
co_x, co_y = 0,0
for i in range(1584471): # number of records in the list
co_x = df.iloc[i]['iid']
# print co_x
co_y = df.iloc[i]['rat']
# print co_y
return co_x, co_y
e1,e2 = gen_array()
list1 = [e1, e2]
np_array = np.hstack(list1)
Currently when I print list1, np_array it returns only the latest value processed in function. Where I need all records from the loop to be stored in it. I've already tried the append(), vstack() & hstack() methods to do this.
Required format of Data
[[x1, y1], [x2, y2], [x3, y3], ...[xn, yn]]
Notice the commas and brackets
P.S. I've tried repr
and that shows "array([[]])" at the start and is not required.
The Need
Here's a sample code on "How to pass values to KMeans fit method"
X = np.array([[1, 2],
[5, 8],
[1.5, 1.8],
[8, 8],
[1, 0.6],
[9, 11]])
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
Notice the variable X on how it is holding the values.
Failed references
EDIT
kmeans = KMeans(n_clusters=3).fit(df['iid','rat'])
centroids = kmeans.cluster_centers_
label = kmeans.labels_
print "CENTROIDS :", centroids
print "LABEL :",label
colors = {"g.", "r.", "b."}
for i in range(len(df['iid','rat'])):
print "Coordinate", df['iid','rat'][i], "label:", label[i]
plt.plot(df['iid','rat'][i][0], df['iid','rat'][i][1], markersize = 10)
plt.scatter(centroids[:, 0], centroids[:, 1], centroids[:, 2], marker='x', s=150, linewidths= 5, zorder = 10)
plt.show()
Returns many other internal files and ends with
File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: ('iid', 'rat')