0

So I have a data in the form [UID obj1 obj2..] x timestamp and I want to cluster this data in python using kmeans from sklearn. Where should I start?

EDIT:

So basically I'm trying to cluster users based on clickstream data, and classify them based on usage patterns.

Siddharth Shah
  • 113
  • 4
  • 11
  • 3
    Could you [create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve)? – Anton Protopopov Feb 09 '16 at 04:38
  • 2
    Could you give an example of what you are trying to achieve? – Neil Feb 09 '16 at 06:46
  • Duplicate question: http://stackoverflow.com/questions/3503668/how-to-cluster-time-series-data-using-k-means-algorithm – pavel Feb 09 '16 at 08:57
  • sci-kit has great implementations of [k-means](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans) and other clustering algorithms – America Feb 17 '16 at 17:02

2 Answers2

0

You can add more features based on the raw data, and using methods like RFM Analysis. RFM = recency, frequency, monetary

For example:

How often the user logged in?

The last time the user logged in?

kingbase
  • 1,268
  • 14
  • 23
0

You can use Python library Retentioneering (github), which allows you to cluster your users based on clickstream data with a simple command. You can also specify any target events you are interested in your clusters and explore obtained graphs using interactive graphs.

data.rete.get_clusters(method='kmeans',
                   feature_type='tfidf',
                   n_clusters=8,
                   ngram_range=(1,2),
                   plot_type='cluster_bar',
                   targets=['payment_done','cart']);

results of user clustering

Next you can explore obtained behavioral clusters with interactive graph:

clus_0 = data.rete.filter_cluster(0)
clus_0.rete.plot_graph(thresh=0.1,
                   weight_col='user_id',
                   targets = {'lost':'red',
                              'payment_done':'green'})

graph visualization example