1

I am facing a little problem while calling a variable from a different file. I have two different files train_dataset.py and test_dataset.py. I run the train_dataset.py file from my IDE and note the value of the array variable array_val as given below.

array([[ 0.08695652,  0.66459627,  0.08695652,  0.07453416,  0.07453416,
        ... 0.15217391]])

Now I switch on to test_dataset.py and import import train_dataset and print the value of array_val by calling train_dataset.array_val, I see a very different output. The output is given below.

    array([[  8.11594203e-01,   1.15942029e-01,   4.05797101e-01,
            ... 1.30434783e-01,   5.65217391e-01,   2.02898551e-01]])

Please suggest how do I get rid of it and state the reason of the discrepancy.

Please find the code that I have embedded in my train_dataset.py

no_of_clusters=9
cluster_centroids=[]
k_means=KMeans(n_clusters=no_of_clusters,n_init=14, max_iter=400)

k_means.fit(matrix_for_cluster)

labels=k_means.labels_
array_val=k_means.cluster_centers_

i.e matrix_for_cluster is a numpy n-dimensional array.

In my test_dataset.py all I do is

import train_dataset
print train_dataset.array_val
Konstantin
  • 24,271
  • 5
  • 48
  • 65
Sam
  • 2,545
  • 8
  • 38
  • 59

1 Answers1

3

This is probably due to the random initialization of the k-means algorithm

As @ali_m explains nicely in the comments, the line import train_dataset re-runs the clustering and the cluster centers are not actually saved from the previous time you ran the code. To do that you can serialise the data with

YXD
  • 31,741
  • 15
  • 75
  • 115
  • Ya, But I run it only once. And then from a different shell I just call the variable that holds the array of centroid position. Do you mean when I call the variable the centroids are re-initialized. – Sam Apr 20 '15 at 13:07
  • Run it the same way twice. Do you get the same results? – YXD Apr 20 '15 at 13:09
  • No, Results are different for different run that I am aware of, but what I am unable to understand is that, when I run the model once and save the centroid points to a variable and call it from a different shell (train_dataset.array_var) it shows different output, but when I run just array_var or in the same shell it gives the same output. – Sam Apr 20 '15 at 13:16
  • Try adding the line `print array_val` after `array_val=k_means.cluster_centers_` in train_dataset.py – YXD Apr 20 '15 at 13:18
  • 1
    @user2404193 You are not really "saving" the centroids by assigning them to some variable in your script, since the clustering will still be re-run every time you import or reload your `train_dataset` script. To truly "save" the results, you should write them to some external file (e.g. using `np.save`). – ali_m Apr 20 '15 at 13:25