Hello I am a machine learning newbie. I need some help with unsupervised clustering of high dimentional data. I have data with over 15 dimensions with around 50 - 80 thousand rows. The data looks something like this (15 participants with almost equal number of rows each and 15 features) -
Participant | time | feature 1 | feature 2... |
---|---|---|---|
1 | 0.05 | val | val |
1 | 0.10 | val | val |
2 | 0.05 | val | val |
2 | 0.10 | val | val |
2 | 0.15 | val | val |
The data consists of many participants, each participant has multiple rows of data and they are time stamped with their features. My goal is to cluster this data according to participants and make inferences based on these clusters. The problem here is that there are many rows for each participant and I cannot represent each participant with a single point so clustering them seems like a difficult task.
I would like help with:
What would be the best way to cluster this data so that I can make inferences according to the participant ?
Which clustering technique should I use? I have tried sklearn's Kmeans, meanshift and other libraries but they take too long and crash my system.
Sorry If it's a bit difficult to understand I will try my best to answer your questions. Thank you in advance for the help. If this question is very similar to some other question please let me know (I was not able to find it).
Thank you :)