Discrepancy in array value when calling a variable from different file - python

Question

I am facing a little problem while calling a variable from a different file. I have two different files train_dataset.py and test_dataset.py. I run the train_dataset.py file from my IDE and note the value of the array variable array_val as given below.

array([[ 0.08695652,  0.66459627,  0.08695652,  0.07453416,  0.07453416,
        ... 0.15217391]])

Now I switch on to test_dataset.py and import import train_dataset and print the value of array_val by calling train_dataset.array_val, I see a very different output. The output is given below.

    array([[  8.11594203e-01,   1.15942029e-01,   4.05797101e-01,
            ... 1.30434783e-01,   5.65217391e-01,   2.02898551e-01]])

Please suggest how do I get rid of it and state the reason of the discrepancy.

Please find the code that I have embedded in my train_dataset.py

no_of_clusters=9
cluster_centroids=[]
k_means=KMeans(n_clusters=no_of_clusters,n_init=14, max_iter=400)

k_means.fit(matrix_for_cluster)

labels=k_means.labels_
array_val=k_means.cluster_centers_

i.e matrix_for_cluster is a numpy n-dimensional array.

In my test_dataset.py all I do is

import train_dataset
print train_dataset.array_val

I have updated the code, Please have a look – Sam Apr 20 '15 at 12:52 — Sam, Apr 20 '15 at 12:52

YXD · Answer 1 · 2015-04-20T14:06:46.303

3

This is probably due to the random initialization of the k-means algorithm

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

As @ali_m explains nicely in the comments, the line import train_dataset re-runs the clustering and the cluster centers are not actually saved from the previous time you ran the code. To do that you can serialise the data with

edited Apr 20 '15 at 14:06

answered Apr 20 '15 at 12:54

YXD

31,741
15
75
115

Ya, But I run it only once. And then from a different shell I just call the variable that holds the array of centroid position. Do you mean when I call the variable the centroids are re-initialized. – Sam Apr 20 '15 at 13:07
Run it the same way twice. Do you get the same results? – YXD Apr 20 '15 at 13:09
No, Results are different for different run that I am aware of, but what I am unable to understand is that, when I run the model once and save the centroid points to a variable and call it from a different shell (train_dataset.array_var) it shows different output, but when I run just array_var or in the same shell it gives the same output. – Sam Apr 20 '15 at 13:16
Try adding the line `print array_val` after `array_val=k_means.cluster_centers_` in train_dataset.py – YXD Apr 20 '15 at 13:18
1

@user2404193 You are not really "saving" the centroids by assigning them to some variable in your script, since the clustering will still be re-run every time you import or reload your `train_dataset` script. To truly "save" the results, you should write them to some external file (e.g. using `np.save`). – ali_m Apr 20 '15 at 13:25

Discrepancy in array value when calling a variable from different file - python

1 Answers1