1

Looking at this code from here:

import numpy as np
from kmodes.kmodes import KModes

# random categorical data
data = np.random.choice(20, (100, 10))

km = KModes(n_clusters=4, init='Huang', n_init=5, verbose=1)
clusters = km.fit_predict(data)

# Print the cluster centroids
print(km.cluster_centroids_)

Does anyone happen to know how to save the "clustering model" and apply it to new data? Or in other words cluster previously unseen data? Thanks.

cs0815
  • 16,751
  • 45
  • 136
  • 299

1 Answers1

2

You can use pickle for this task.

import pickle

with open('cluster_model.pickle', 'wb') as n:
    pickle.dump(km, n)

When you want to use it on new data, simply:

with open('cluster_model.pickle', 'rb') as f:
    km = pickle.load(f)

# If your new data is called "new_data", you can:
new_clusters = km.predict(new_data)
artemis
  • 6,857
  • 11
  • 46
  • 99
  • thanks. yes I knew about pickle. so the predict method would work. does this also work for KPrototypes do you reckon? I will try soon ... – cs0815 Feb 10 '22 at 16:15
  • If this solves your problem, don't forget to mark as correct to help others in the future @cs0815 – artemis Feb 14 '22 at 20:59