0

My Dataset

  • In numpy array
  • np.shape(data) -> (6989, 4)
  • stats.describe(data) -> DescribeResult(nobs=6989, minmax=(array([0., 0., 0., 0.]), array([ 299.99, 86785. , 10997. , 13222. ])), mean=array([ 12.47994992, 3407.00243239, 27.23293747, 109.72370869]), variance=array([1.42652452e+02, 4.71755188e+07, 6.17027586e+04, 2.92787820e+05]), skewness=array([ 4.27783176, 4.50762479, 31.57678605, 15.68962365]), kurtosis=array([ 58.23586935, 27.33838487, 1163.74537023, 302.6384056 ]))
  • stats.describe(clusterer.labels_) -> DescribeResult(nobs=6989, minmax=(array([0., 0., 0., 0.]), array([ 299.99, 86785. , 10997. , 13222. ])), mean=array([ 12.47994992, 3407.00243239, 27.23293747, 109.72370869]), variance=array([1.42652452e+02, 4.71755188e+07, 6.17027586e+04, 2.92787820e+05]), skewness=array([ 4.27783176, 4.50762479, 31.57678605, 15.68962365]), kurtosis=array([ 58.23586935, 27.33838487, 1163.74537023, 302.6384056 ]))
  • np.shape(clusterer.labels_) -> (6989,)

Original Dataset

CODE Original guide that I am following all code

color_palette = sns.color_palette('Paired', 12)
cluster_colors = [color_palette[x] if x >= 0
                  else (0.5, 0.5, 0.5)
                  for x in clusterer.labels_]
cluster_member_colors = [sns.desaturate(x, p) for x, p in zip(cluster_colors, clusterer.probabilities_)]
plt.scatter(*projection.T, 
            s=20, 
            linewidth=0, 
            c=cluster_member_colors, 
            alpha=0.25)

ERROR

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-175-64c069b8643a> in <module>
      2 cluster_colors = [color_palette[x] if x >= 0
      3                   else (0.5, 0.5, 0.5)
----> 4                   for x in clusterer.labels_]
      5 cluster_member_colors = [sns.desaturate(x, p) for x, p in zip(cluster_colors, clusterer.probabilities_)]
      6 plt.scatter(*projection.T, 

<ipython-input-175-64c069b8643a> in <listcomp>(.0)
      2 cluster_colors = [color_palette[x] if x >= 0
      3                   else (0.5, 0.5, 0.5)
----> 4                   for x in clusterer.labels_]
      5 cluster_member_colors = [sns.desaturate(x, p) for x, p in zip(cluster_colors, clusterer.probabilities_)]
      6 plt.scatter(*projection.T, 

IndexError: list index out of range

Tried Solutions

  • I have no nan values in my dataset I have tried print(np.isnan( np.sum(clusterer.labels_))) ant it was False
  • I can see here what is programmatically the problem that my code array starts with 0 numbering the elements. The issue is that the same code has been used with both mine and the original dataset. And it gives no error with the original dataset and it gives error with mine. - https://stackoverflow.com/a/1098660/10270590
sogu
  • 2,738
  • 5
  • 31
  • 90
  • 1
    The error points to the `cluster_colors` list comprehension. WIthin that you index `color_palette`. I'm guessing that's a list. So the error is in the values in `cluster.labels_`. At least one is too large to index palette. – hpaulj Mar 31 '21 at 16:16
  • I have switced and it is working now ```color_palette = sns.color_palette('Paired', 1000)``` it was 12 before. But how many it should be? – sogu Mar 31 '21 at 16:23
  • @hpaulj if you post your comment as an answers I am accepting it as a correct answer for my question. Thank you. – sogu Apr 01 '21 at 09:48

1 Answers1

0

The issue was solved by adding more colors. Ex.:

color_palette = sns.color_palette('Paired', 1000)

sogu
  • 2,738
  • 5
  • 31
  • 90