-1

I hope you are doing well. I have the following code Which implements K-means in MATLAB, I want to implement it in python. I am unable to implement it in python. Can anybody please help me with that

Dataset

0.119349659383,2765187888.188327790000,-50.272277924288,0.000010124208
0.119639999551,2780553879.583636760000,-45.173332876699,0.000015075661
0.119899673836,2765356033.223678110000,-50.327888424563,0.000010123978
0.120209965074,2780981089.939126490000,-45.152589356947,0.000015059274
0.120449679454,2765635512.158593650000,-50.363949423158,0.000010131346
dataset= readmatrix('newdata.txt');


[idx,C,sumdist] = kmeans(dataset,3,'Display','final','Replicates',5);
figure
gscatter(dataset(:,1),dataset(:,2),idx,'bgm')
hold on
plot(C(:,1),C(:,2),'kx')
legend('Cluster 1','Cluster 2','Cluster 3','Cluster Centroid')

dataset_idx=zeros(size(dataset,1));
dataset_idx=dataset(:,:);
dataset_idx(:,5)=idx;

clusters = cell(3,1);
for i = 1:3
    clusters{i} = dataset_idx(dataset_idx(:,5) == i,:);
    figure;
    scatter(clusters{i}(:,1),clusters{i}(:,2))
    legend(sprintf('Cluster %d',i))
    title(sprintf('Cluster %d',i))
end


for i = 1:3
    T = clusters{i}(:,1);
    fprintf('\nCLUSTER %d:\n',i)
    DeltaT = diff(T);
    MclusterTimeseries = mean(DeltaT);
    formatSpec = 'Mean DeltaT of Cluster %d is %4e\n';
    fprintf(formatSpec,i,MclusterTimeseries)
    MclusterFrequncy = mean(clusters{i}(:,2));
    formatSpec = 'Mean Frequncy of Cluster %d is %4e\n';
    fprintf(formatSpec,i,MclusterFrequncy)
    MclusterAmplitude = max(clusters{i}(:,3));
    formatSpec = 'Max Amplitude of Cluster %d is %4.4f\n';
    fprintf(formatSpec,i,MclusterAmplitude)
    Mcluster1PW = mean(clusters{i}(:,4));
    formatSpec = 'Mean Pulse Width of Cluster %d is %4e\n';
    fprintf(formatSpec,i,Mcluster1PW)
end
Zoe
  • 27,060
  • 21
  • 118
  • 148
  • 1
    I saw some false syntax there in your code if I regard that as Python code. Actually, if you want to look for reference, you can take a look at https://realpython.com/k-means-clustering-python/ . – Dhana D. Jan 25 '22 at 02:16
  • @DhanaD. I have look k-means in python, but I am unable to read the data and implement the same algorithm in python as above – Med FutureXAI Jan 25 '22 at 02:27
  • I think you may need to read the docs of python so you can get the equivalent syntax for your matlab code and migrate it to python. These libraries maybe helpful for you as well: numpy, scikit-learn, pandas. – Dhana D. Jan 25 '22 at 02:29
  • @DhanaD. I am trying for the past 1 week. but i am unable to do that – Med FutureXAI Jan 25 '22 at 02:30

2 Answers2

1

As @Debi Prasad Sen suggested above, the fastest/easiest way to do this is to just use sklearn's tried and tested implementation of the KMeans algorithm (see here for documentation).

Alternatively, you could write your own implementation - here's a simple function that I wrote in Python, per your comment:

import numpy as np
from numpy.random import randint
from typing import Tuple, NewType
from scipy.spatial.distance import cdist

ndy = NewType("numpy ndarray", np.ndarray)

def kmeans(X: ndy, k: int, reps: int, seed: int=17)-> Tuple[ndy, ndy]:
    np.random.seed(seed) # 17 is my favorite number
    labels = np.zeros(X.shape[0], dtype=int)
    centroids = X[randint(0, X.shape[0], size=k, dtype=int),:]
    for r in range(reps):
        labels = np.argmin(cdist(X, centroids), axis=1)
        for i in range(k):
            np.mean(X[(labels==i), :], axis=0, out=centroids[i])

    return (labels, centroids)
v0rtex20k
  • 1,041
  • 10
  • 20
0

Refer to the following query link to understand how to read a text file using the Pandas library of python.
For the implementation of K-means, You can use sci-kit learn library or you can build it from scratch using the NumPy, refer to this article

Debi Prasad
  • 297
  • 1
  • 8