Scikit-learn MemoryError when using fit_transform for t-SNE

Question

I'm trying to visualise the output of a neural network (implemented in Keras) using t-SNE. I get a MemoryError when calling fit_transform. Currently I'm running my code on Windows 10.

Code:

layer_outputs = [layer.output for layer in encoder.layers]

#layer_outputs[3].output_shape is (None,32)
v_model = Model(input=encoder.input, output=layer_outputs[3])

output = v_model.predict(x_train)

tsne = TSNE(n_components=2, random_state=0)
y = tsne.fit_transform(output) #ERROR HERE
...

How many predictions do you have in `output`? Have you tried less samples? Have you tried reducing the dimensionality with PCA? — petezurich, Jun 26 '17 at 14:13
And what do you think we can do here? Who knows how big ```output``` is? And for the [curious](https://github.com/scikit-learn/scikit-learn/issues/7089). — sascha, Jun 26 '17 at 14:13
@mkaran Care to elaborate? This does not sound right. There might be differences in 32-bit mode, but who cares nowadays. There should be no problem with a vanilla 64-bit build available on python.org — sascha, Jun 26 '17 at 14:16
@sascha I am assuming **32 bit Python** here not the 64bit. I have stumble upon this limitation many times unfortunately. Take a look at [this](https://stackoverflow.com/questions/18282867/python-32-bit-memory-limits-on-64bit-windows) and [this](https://msdn.microsoft.com/en-us/library/aa366778.aspx#physical_memory_limits_windows_10).There is a work around with the [`LARGEADDRESSAWARE` flag](https://www.coveros.com/increasing-the-amount-of-memory-available-to-a-32-bit-windows-application/) but I would not recommend it. — mkaran, Jun 26 '17 at 14:22
@mkaran But this has nothing to do with python and is OS-stuff (applying to every C++ program too). But yeah, i think most people use 64-bit builds and there should be no problem at all. When saying something which sounds that bad, you should make more clear what you are assuming (32-bit). — sascha, Jun 26 '17 at 14:25
@sascha Yes it is OS-stuff and not Python specific. Most people I know still use 32 bit - and usually Python 2 - that's y I wrongfully assumed 32 bit, I should have clarified in the first comment. Removed the first comment as misleading. — mkaran, Jun 26 '17 at 14:29
@matchifang So why no reaction to the core-question: how big is your output data? Hidden in my first comment, there is probably also a solution (depending a bit on your data-size and your basic skills around python). — sascha, Jun 26 '17 at 14:34
@mkaran Apologies. I reduced the size of output to only take the first few items, and it worked. So the problem was indeed from the output size. — matchifang, Jun 26 '17 at 14:36
@matchifang Good. My link explains a bit the space-complexity and you can do some math for your case. It seems, the more memory-efficient implementation is not ready yet, so despite my comment from before, it's maybe not a ready solution to your problem. — sascha, Jun 26 '17 at 14:37

score 1 · Answer 1 · answered Jun 26 '17 at 19:51

You need to reduce the output size to a value that your system can handle.

In addition to that you can use Principal Component Analysis (PCA) to reduce the dimensionality of output before you feed it in to t-SNE. See here: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Scikit-learn MemoryError when using fit_transform for t-SNE

1 Answers1