-2

im trying to code this algorithm but im struggling with it and step 6 is confusing me my code so far is at the bottom

  1. Set a positive value for K.
  2. Select K different rows from the data matrix at random.
  3. For each of the selected rows a. Copy its values to a new list, let us call it c. Each element of c is a number. (at the end of step 3, you should have the lists 1 , 2 , … , . Each of these should have the same number of columns as the data matrix)
  4. For each row i in the data matrix a. Calculate the Manhattan distance between data row ′ and each of the lists 1 , 2 , … , . b. Assign the row ′ to the cluster of the nearest c. For instance, if the nearest c is 3 then assign row i to the cluster 3 (ie. you should have a list whose ith entry is equal to 3, let’s call this list S).
  5. If the previous step does not change S, stop.
  6. For each k = 1, 2, …, K a. Update . Each element j of should be equal to the median of the column ′ but only taking into consideration those rows that have been assigned to cluster k.
  7. Go to Step 4.

Notice that in the above K is not the same thing as k


#This is what i have so far:
def clustering(matrix,k):
    for i in k: 

I'm stuck with how it would choose the rows randomly and also I don't understand what step 5 and 6 mean if someone could explain

ascripter
  • 5,665
  • 12
  • 45
  • 68
dee
  • 5
  • 1

1 Answers1

0

You need np.random.choice.

Use this:

import numpy as np

# some data with 10 rows and 5 columns
X=np.random.rand(10,5)

def clustering(X,k):
    # create the random indices (selector)
    random_selector = np.random.choice(range(X.shape[0]), size=k, replace=False) # replace=False to get unique samples

    # select randomly k=10 lines
    sampled_X = X[random_selector] # X[random_selector].shape = (10,5)
    .
    .
    .
    return #SOMETHING

Now you can continue working on your ?homework?

seralouk
  • 30,938
  • 9
  • 118
  • 133
  • Why not use `random.randint()`? (https://stackoverflow.com/q/14262654/11301900) – AMC Dec 18 '19 at 18:19
  • it's the same thing -- I prefer `choice` – seralouk Dec 18 '19 at 19:11
  • You do realize that this way is just more complicated for no reason, right? – AMC Dec 18 '19 at 19:14
  • in this particular case yes. it leads to the same desired result. – seralouk Dec 18 '19 at 19:14
  • Sure, but just because two methods give the same result doesn’t mean that they are equally as good. I see no benefit to using `choice`, it’s reinventing the wheel. – AMC Dec 18 '19 at 19:15
  • funny thing that the link you posted uses also `choice` as a second solution. `AND` the second most upvoted answer. Cheers – seralouk Dec 18 '19 at 19:23
  • Did you read the first answer carefully? It’s using it to draw numbers without replacement. If that’s what OP wants, then **yes**, they should use `choice` (although I would probably recommend `numpy.arange` over a generic `range()`). For the more general case, however, `randint` is the way to go. – AMC Dec 18 '19 at 19:58
  • Hmm actually looking at one of the answers in the other post, and the docs, it seems that you can use just `.shape[0]`, and `choice` will behave as if you had used `arange`. Useful stuff! – AMC Dec 18 '19 at 20:02
  • great. cheers. ! – seralouk Dec 19 '19 at 18:24