I am writing a simple K-means algorithm for clustering and I am trying to render a scatter plot showing sample data ( rows of a sample data loaded from a CSV file into a numpy matrix X).
Let us say X is a numpy matrix with each row containing the example data with 10 features. for my case they are attributes of a network flow containing src IP address, destination IP address , src port or destination port. I have also computed the centroids for K-mean ( where K is the total centroids). I have an list idx which is nothing but indices of the centroid to which individual X-row belongs. for example if row 5 of X numpy matrix belongs to centroid =3, will have an idx[4]=3 ( since we start from 0). With this , each row of X, containing individual data record of 10 features belongs to unique centroid. I want to draw scatter plot the data points in X coloring them separately for each centroid. for example if row 5, 8 of X is closer to centroid 3, I want to color them with a different color. if I were to do it in Octave, I could have written the code like this:-
function plotPoints(X,idx,K)
p= hsv(K+1) % palette
c= p(idx,:) % color
scatter(X(:,1),X(:,2),15,c) % plot the scatter plot
However in python , I am not sure how to implement the same so that I can show data samples with the same index assignment have the same color. My code currently is shows all the X rows in red and all the centroids in Blue as shown below:-
def plotPoints(X,idx,K,centroids):
srcport=X[:,5]
dstport=X[:,6]
fig = plt.figure()
ax=fig.add_subplot(111,projection='3d')
ax.scatter(srcport,dstport,c='r',marker='x')
ax.scatter(centroids[:,5],centroids[:,6],c='b',marker='o', s=160)
ax.set_xlabel('Source port')
ax.set_xlabel('Destination port')
plt.show()
Please note: I am only plotting 2 features on x & y axis and not all of the 10 features. I should have mentioned that earlier.