2

I am using scipy.cluster.hierarchy.fclusterdata function to cluster a list of vectors (vectors with 384 components).

It works nice, but when I try to cluster large amounts of data I run out of memory and the program crashes.

How can I perform the same task without running out of memory?

My machine has 32GB RAM, Windows 10 x64, python 3.6 (64 bit)

jtlz2
  • 7,700
  • 9
  • 64
  • 114

2 Answers2

0

You'll need to choose a different algorithm.

Hierarchical clustering needs O(n²) memory and the textbook algorithm O(n³) time. This cannot scale well to large data.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • What would you suggest? I just want an algorithm that creates clusters from a list of vectors. I don't want to specify the number of clusters to be formed – Samuel Ferreira Oct 16 '19 at 09:16
0

You could have a look at

However, you will have to set up some pipeline to test different numbers of clusters. It's hard to say which algorithm will work best for you, though.

Gregor
  • 588
  • 1
  • 5
  • 19