creating numpy array of lists from dataframe using python

Question

The Scenario

The dataframe reads a csv seperated by commas containing 3 cols, out of which 2nd and 3rd columns values needs to be taken in list and to be stored in array. This format is mandatory since the sklearn.cluster.KMeans takes only in this format.

The Problem

I've already written a function to generate array and then pass it in a list, which will then be added in array subsequently.

CODE:

def gen_array():
    co_x, co_y = 0,0
    for i in range(1584471):      # number of records in the list
        co_x = df.iloc[i]['iid']
        # print co_x
        co_y = df.iloc[i]['rat']
        # print co_y
    return co_x, co_y

e1,e2 = gen_array()
list1 = [e1, e2]
np_array = np.hstack(list1)

Currently when I print list1, np_array it returns only the latest value processed in function. Where I need all records from the loop to be stored in it. I've already tried the append(), vstack() & hstack() methods to do this.

Required format of Data

[[x1, y1], [x2, y2], [x3, y3], ...[xn, yn]]

Notice the commas and brackets

P.S. I've tried repr and that shows "array([[]])" at the start and is not required.

The Need

Here's a sample code on "How to pass values to KMeans fit method"

X = np.array([[1, 2],
              [5, 8],
              [1.5, 1.8],
              [8, 8],
              [1, 0.6],
              [9, 11]])
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)

Notice the variable X on how it is holding the values.

Failed references

EDIT

kmeans = KMeans(n_clusters=3).fit(df['iid','rat'])

centroids = kmeans.cluster_centers_
label = kmeans.labels_

print "CENTROIDS :", centroids
print "LABEL :",label

colors = {"g.", "r.", "b."}

for i in range(len(df['iid','rat'])):
    print "Coordinate", df['iid','rat'][i], "label:", label[i]
    plt.plot(df['iid','rat'][i][0], df['iid','rat'][i][1], markersize = 10)

plt.scatter(centroids[:, 0], centroids[:, 1], centroids[:, 2], marker='x', s=150, linewidths= 5, zorder = 10)
plt.show()

Returns many other internal files and ends with

  File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: ('iid', 'rat')

SKLearn methods also accept DataFrames as it is, so there is no need to convert them. Just use it this way: `kmeans.fit(df.iloc[:, 1:3])` — MaxU - stand with Ukraine, Aug 12 '17 at 10:49
@ayhan you should've answered better there rather marking this as duplicate! I just improved on specificity of my question here and makes all together a different sense now. — T3J45, Aug 12 '17 at 10:54
@MaxU I could use that, but I need more than once in the code. So, it's better in a variable. Any more ideas? — T3J45, Aug 12 '17 at 10:56
i'd use `df[column_list]` approach or the one i've shown you above instead of making a new variable (i.e. multiplying redundant data in memory) — MaxU - stand with Ukraine, Aug 12 '17 at 10:58
Did that, got many errors. Just check in EDIT section, I'm updating. Correct if wrong. — T3J45, Aug 12 '17 at 11:01
you did it wrong - use this: `df[['iid','rat']]` instead of `df['iid','rat']` — MaxU - stand with Ukraine, Aug 12 '17 at 11:10
The answer there was *good enough* to show you that you need two brackets. — ayhan, Aug 12 '17 at 11:12
Looks like Jezrael replied recently that solved the problem @ayhan So before marking atleast you could ping in comments here. — T3J45, Aug 12 '17 at 11:14
Much greatful for your time @MaxU I've received my answer there. The previously asked question has a linking here and being marked Duplicate by a Great reputed person here, so got to quit this now. — T3J45, Aug 12 '17 at 11:16
@T3J45, actually i agree with ayhan. I didn't see your previous question and corresponding answers before. I think [juanpa.arrivillaga gave you very good answer there, explaining the difference](https://stackoverflow.com/a/45620966/5741205)... — MaxU - stand with Ukraine, Aug 12 '17 at 11:21
@MaxU Well, Ya. It was in good depth however if it doesn't help how do I agree to it. For the matter of fact I did wait for the answer there until now. Now that it is solved, I'll remove this. — T3J45, Aug 12 '17 at 11:25

creating numpy array of lists from dataframe using python

The Scenario

The Problem

The Need

EDIT

0 Answers0