1

My dataset is composed of records of music streamings from users. I have around 100 different music genres and I would like to cluster them depending on the distribution of ages of listeners.

To be more clear, ages of users are divided into "blocks" (A1: 0-10 years; A2: 11-20 years,..., A6: 61+) and thus an example of the data I would like to cluster is the following:
Pop: 0.05 A2; 0.3 A3; 0.35 A3; 0.2 A4; 0.05 A5; 0.05 A6
Rock: 0.05 A2; 0.2 A3; 0.2 A3; 0.1 A4; 0.15 A5; 0.1 A6

I would like to obtain clusters of genres with similar distributions. How can I do this in Python? Can I just treat each genre as a datapoint in a 6-dimensional space or should I use something more refined? For example, can I use a custmized distance for distirbutions in a clustering algorithm?

Thank you

Aeb
  • 11
  • 2
  • See https://stackoverflow.com/questions/33721996/how-to-specify-a-distance-function-for-clustering using e.g. the RMS, see https://en.wikipedia.org/wiki/Root-mean-square_deviation – Carlos Horn Jul 04 '22 at 10:48

1 Answers1

0

If you have prior knowledge to design your distance function with, all algorithms from scipy.cluster.hierarchy should support that.

My opinion: you should be fine with classic clustering methods from the problem statement, at least one (KMeans, Spectral, DBSCAN ... with proper parameters) should do the trick.

rikyeah
  • 1,896
  • 4
  • 11
  • 21