Nonmetric Multi-Dimensional Scaling using sklearn.manifold.MDS with Large data is not possible?

Question

I am trying to visualize my high dimensional data set in two axis or components using nonmetric multi-dimensional scaling. This function is available in scikit-learn library. Here is my code:

from sklearn.manifold import MDS 

embedding = MDS (n_components=2, metric= False, n_init=2, max_iter=100, 
                 verbose=0, eps=0.001, n_jobs=2, random_state=101
                ,dissimilarity='euclidean')
#precip=precip[0:100]

precip_transformed = embedding.fit_transform(precip)
precip_transformed

The default values for n_init is 4 and max_iter is 300 and n_jobs=None (which means -1). This takes forever to run even though I reduced the default values and increased the n_jobs. It also makes my notebook crash after a while. I should mention that my data has 20000 rows and when I keep the commented out line of the code (only 100 rows), it works. Does anyone know how I can make this work? faster or some way to make sure the notebook won't crash.

Did you scale your features? That can affect performance on many dimensionality-reduction techniques — G. Anderson, Jan 30 '19 at 18:29
Instead of just selecting first 100 rows you can sample randomly. But algorithm has O(n^3) complexity, so you still can't use a lot instances. — hellpanderr, Jan 30 '19 at 18:59
As all those 20,000 rows are the watersheds I am analyzing I cannot reduce - I just wanted to check if it will work on 100 rows- So, you think this computation will not be possible for all 20,000 rows? @hellpanderr — ilearn, Jan 30 '19 at 19:11
@G.Anderson They are all have same scale or units- so, no I did not scale them — ilearn, Jan 30 '19 at 19:12
I haven't used it by there is a package called `megaman` https://arxiv.org/pdf/1603.02763.pdf that is supposed to be able to handle bigger datasets — hellpanderr, Jan 30 '19 at 19:43
Also MDS is pretty slow, have you tried other methods from `manifold`? — hellpanderr, Jan 30 '19 at 19:45
@hellpanderr The reason I have to use Nonmetric multi-dimensional scaling is this recommendation that I should follow: NMS is generally regarded as the most effective ordination method for ecological community data as it is well suited to non-normal and categorical data - — ilearn, Jan 30 '19 at 20:00
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/187602/discussion-between-sina-shabani-and-hellpanderr). — ilearn, Jan 30 '19 at 20:06

Nonmetric Multi-Dimensional Scaling using sklearn.manifold.MDS with Large data is not possible?

0 Answers0