0

I'm trying to visualise the output of a neural network (implemented in Keras) using t-SNE. I get a MemoryError when calling fit_transform. Currently I'm running my code on Windows 10.

Code:

layer_outputs = [layer.output for layer in encoder.layers]

#layer_outputs[3].output_shape is (None,32)
v_model = Model(input=encoder.input, output=layer_outputs[3])

output = v_model.predict(x_train)

tsne = TSNE(n_components=2, random_state=0)
y = tsne.fit_transform(output) #ERROR HERE
...
Tom de Geus
  • 5,625
  • 2
  • 33
  • 77
matchifang
  • 5,190
  • 12
  • 47
  • 76
  • Windows or Unix? – mkaran Jun 26 '17 at 14:06
  • @mkaran windows – matchifang Jun 26 '17 at 14:07
  • 1
    How many predictions do you have in `output`? Have you tried less samples? Have you tried reducing the dimensionality with PCA? – petezurich Jun 26 '17 at 14:13
  • And what do you think we can do here? Who knows how big ```output``` is? And for the [curious](https://github.com/scikit-learn/scikit-learn/issues/7089). – sascha Jun 26 '17 at 14:13
  • @mkaran Care to elaborate? This does not sound right. There might be differences in 32-bit mode, but who cares nowadays. There should be no problem with a vanilla 64-bit build available on python.org – sascha Jun 26 '17 at 14:16
  • @sascha I am assuming **32 bit Python** here not the 64bit. I have stumble upon this limitation many times unfortunately. Take a look at [this](https://stackoverflow.com/questions/18282867/python-32-bit-memory-limits-on-64bit-windows) and [this](https://msdn.microsoft.com/en-us/library/aa366778.aspx#physical_memory_limits_windows_10).There is a work around with the [`LARGEADDRESSAWARE` flag](https://www.coveros.com/increasing-the-amount-of-memory-available-to-a-32-bit-windows-application/) but I would not recommend it. – mkaran Jun 26 '17 at 14:22
  • @mkaran But this has nothing to do with python and is OS-stuff (applying to every C++ program too). But yeah, i think most people use 64-bit builds and there should be no problem at all. When saying something which sounds that bad, you should make more clear what you are assuming (32-bit). – sascha Jun 26 '17 at 14:25
  • @sascha Yes it is OS-stuff and not Python specific. Most people I know still use 32 bit - and usually Python 2 - that's y I wrongfully assumed 32 bit, I should have clarified in the first comment. Removed the first comment as misleading. – mkaran Jun 26 '17 at 14:29
  • Btw @matchifang: are you using 32 bit or 64 bit Python? – mkaran Jun 26 '17 at 14:29
  • @mkaran I'm using 64bit Python – matchifang Jun 26 '17 at 14:33
  • 1
    @matchifang So why no reaction to the core-question: how big is your output data? Hidden in my first comment, there is probably also a solution (depending a bit on your data-size and your basic skills around python). – sascha Jun 26 '17 at 14:34
  • @mkaran Apologies. I reduced the size of output to only take the first few items, and it worked. So the problem was indeed from the output size. – matchifang Jun 26 '17 at 14:36
  • @matchifang Good. My link explains a bit the space-complexity and you can do some math for your case. It seems, the more memory-efficient implementation is not ready yet, so despite my comment from before, it's maybe not a ready solution to your problem. – sascha Jun 26 '17 at 14:37

1 Answers1

1

You need to reduce the output size to a value that your system can handle.

In addition to that you can use Principal Component Analysis (PCA) to reduce the dimensionality of output before you feed it in to t-SNE. See here: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

petezurich
  • 9,280
  • 9
  • 43
  • 57