im trying to code this algorithm but im struggling with it and step 6 is confusing me my code so far is at the bottom
- Set a positive value for K.
- Select K different rows from the data matrix at random.
- For each of the selected rows a. Copy its values to a new list, let us call it c. Each element of c is a number. (at the end of step 3, you should have the lists 1 , 2 , … , . Each of these should have the same number of columns as the data matrix)
- For each row i in the data matrix a. Calculate the Manhattan distance between data row ′ and each of the lists 1 , 2 , … , . b. Assign the row ′ to the cluster of the nearest c. For instance, if the nearest c is 3 then assign row i to the cluster 3 (ie. you should have a list whose ith entry is equal to 3, let’s call this list S).
- If the previous step does not change S, stop.
- For each k = 1, 2, …, K a. Update . Each element j of should be equal to the median of the column ′ but only taking into consideration those rows that have been assigned to cluster k.
- Go to Step 4.
Notice that in the above K is not the same thing as k
#This is what i have so far:
def clustering(matrix,k):
for i in k:
I'm stuck with how it would choose the rows randomly and also I don't understand what step 5 and 6 mean if someone could explain