My raw data looks like:
df = long lat long lat long lat long lat 1 11 6 15 19 23 27 30 34 2 12 7 16 20 24 28 31 35 3 13 8 17 21 25 29 32 36 ... 96 14 9 18 22 26 30 33 37
Where: column of
1,2,3,..,96
are "taxi_id
". It means we have96
cars.Other columns are representing location of a car, by assuming them as a couple.
Example: taxi car with a label
1
has location(11,6)(15,19)(23,27)(30,34)
So, I need to cluster them to see the most common trajectories used by these taxi drivers.
To do that I have calculated the "some" distance matrix, then calculated its similarity matrix and applied final matrix to Affinity Propagation
Affinity Propagation code:
from sklearn.cluster import AffinityPropagation af = AffinityPropagation(preference=-6).fit(X) cluster_centers_indices = af.cluster_centers_indices_ labels = af.labels_ # Some code to calculate number of clusters (3 in this case) # Some code to check which "taxi_id" related to clusters
And final data looks like:
final_df = long lat 1 11 22 0 2 33 44 3 55 66 ... ... ... 45 12 13 2 46 14 15 47 16 17
I want to evaluate my clustering. And I do not know how. I did not predict anything, so how can I use the sklearn
evaluations metrics? I can not even find a logic (what exactly to evaluate)? Maybe Distance between two clusters (CD)? Do you have any ideas or solution code how to proceed?