Applying a matrix decomposition for classification using a saved W matrix

Question

I'm performing an NMF decomposition on a tf-idf input in order to perform topic analysis.

def decomp(tfidfm, topic_count):
    model = decomposition.NMF(init="nndsvd", n_components=topic_count,     max_iter=500)
    H = model.fit_transform(tfidfm)
    W = model.components_
    return W, H

This returns W, a model definition consisting of topics to term assignments, and H, a document to topic assignment matrix

So far so good, I can use H to classify documents based on their association via term frequency to a list of topics which in turn are also based on their association to term frequency.

I'd like to save the topic-term-associations to disk so I can reapply them later - and have adopted the method described here [https://stackoverflow.com/questions/8955448] to store the sparse-matrix reperesentation of W.

So what I'd like to do now, is perform the same process, only fixing the topic-definition matrix W.

In the documentation, it appears that I can set W in the calling parameters something along the lines of:

def applyModel(tfidfm,W,topic_count):
    model = decomposition.NMF(init="nndsvd", n_components=topic_count, max_iter=500)
    H = model.fit_transform(X=tfidfm, W=W)
    W = model.components_
    return W, H

And I've tried this, but it doesn't appear to work.

I've tested by compiling a W matrix using a differently sized vocabulary, then feeding that into the applyModel function, the shape of the resulting matrices should be defined (or I should say, that is what I'm intending) by the W model, but this isn't the case.

The short version of this question is: How can I save the topic-model generated from a matrix decomposition, such that I can use it to classify a different document set than the one used to originally generate it?

In other terms, if V=WH, then how can I return H, given V and W?

ForceBru · Accepted Answer · 2016-10-17T18:05:00.217

1

The initial equation is: and we solve it for like this: How to solve it for H .

Here inverse of W denotes the inverse of the matrix , which exists only if is nonsingular.

The multiplication order is, as always, important. If you had if the order is changed , you'd need to multiply by the inverse of the other way round: no description .

edited Oct 17 '16 at 18:05

answered Oct 17 '16 at 17:47

ForceBru

43,482
10
63
98

Of course! Maths wins again. I'll post the solution I used to perform the matrix multiplication/inverse to get H. It looks as though I'm getting meaningful results for the application I'm applying it to. I will mark as answered shortly but would like to leave open to invite any additional answers - I was anticipating something baked into scikit and don't want to replicate a process if there's already something there. – Thomas Kimber Oct 17 '16 at 19:46

score 0 · Answer 2 · answered Oct 19 '16 at 14:33

For completeness, here's the rewritten applyModel function that takes into account the answer from ForceBru (uses an import of scipy.sparse.linalg)

def applyModel(tfidfm,W):
    H = tfidfm * linalg.inv(W)
    return H

This returns (assuming an aligned vocabulary) a mapping of documents to topics H based on a pregenerated topic-model W and document feature matrix V generated by tfidf.

Applying a matrix decomposition for classification using a saved W matrix

2 Answers2