I am required to write a program that implements kmeans
clustering for a given dataset (I roughly understand how kmeans algorithm works). Since I want my program to be generic, I'd like to understand the following terms:
For a given data set that has 100 rows and 10 columns (assuming each column is a feature), how do I identify the following parameters:
- dimension: How do I know the dimension of this dataset?
- data point: Does it mean that every cell
[row][col]
is a data point or the whole row is one data point (vector of points)?