Questions tagged [latent-semantic-indexing]

Latent semantic indexing is an indexing and retrieval method.

Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A claimed feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.

52 questions
17
votes
3 answers

What NLP tools to use to match phrases having similar meaning or semantics

I am working on a project which requires me to match a phrase or keyword with a set of similar keywords. I need to perform semantic analysis for the same. an example: Relevant QT cheap health insurance affordable health insurance low cost medical…
Arun Shyam
  • 559
  • 2
  • 8
  • 20
13
votes
3 answers

Latent Semantic Analysis concepts

I've read about using Singular Value Decomposition (SVD) to do Latent Semantic Analysis (LSA) in corpus of texts. I've understood how to do that, also I understand mathematical concepts of SVD. But I don't understand why does it works applying to…
stemm
  • 5,960
  • 2
  • 34
  • 64
10
votes
1 answer

How do we decide the number of dimensions for Latent semantic analysis ?

I have been working on latent semantic analysis lately. I have implemented it in java by making use of the Jama package. Here is the code: Matrix vtranspose ; a = new Matrix(termdoc); termdoc = a.getArray(); a = a.transpose() ;…
CTsiddharth
  • 907
  • 12
  • 21
10
votes
4 answers

Clustering using Latent Dirichlet Allocation algo in gensim

Is it possible to do clustering in gensim for a given set of inputs using LDA? How can I go about it?
Sharmila
  • 1,637
  • 2
  • 23
  • 30
7
votes
2 answers

Need help in latent semantic indexing

I am sorry, if my question sounds stupid :) Can you please recommend me any pseudo code or good algo for LSI implementation in java? I am not math expert. I tried to read some articles on wikipedia and other websites about LSI ( latent semantic…
user238384
  • 2,396
  • 10
  • 35
  • 36
7
votes
2 answers

Finding topics of an unseen document via Gensim

I am using Gensim to do some large-scale topic modeling. I am having difficulty understanding how to determine predicted topics for an unseen (non-indexed) document. For example: I have 25 million documents which I have converted to vectors in LSA…
Peter Kirby
  • 1,915
  • 1
  • 16
  • 29
6
votes
6 answers

Any Latent Semantic Indexing?

Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model.
avd
  • 13,993
  • 32
  • 78
  • 99
6
votes
1 answer

Latent Semantic Analysis in Python discrepancy

I'm trying to follow the Wikipedia Article on latent semantic indexing in Python using the following code: documentTermMatrix = array([[ 0., 1., 0., 1., 1., 0., 1.], [ 0., 1., 1., 0., 0., 0., 0.], …
Jmjmh
  • 2,016
  • 1
  • 13
  • 11
5
votes
1 answer

Probabilistic latent semantic analysis/Indexing - Introduction

But recently I found this link quite helpful to understand the principles of LSA without too much math. http://www.puffinwarellc.com/index.php/news-and-articles/articles/33-latent-semantic-analysis-tutorial.html. It forms a good basis on which I…
Sharmila
  • 1,637
  • 2
  • 23
  • 30
5
votes
1 answer

combining LSA/LSI with Naive Bayes for document classification

I'm new to the gensim package and vector space models in general, and I'm unsure of what exactly I should do with my LSA output. To give a brief overview of my goal, I'd like to enhance Naive Bayes Classifier using topic modeling to improve…
4
votes
2 answers

How is TF-IDF implemented in gensim tool in python?

From the documents which i found out from the net i figured out the expression used to determine the Term Frequency and Inverse Document frequency weights of terms in a corpus to be tf-idf(wt)= tf * log(|N|/d); I was going through the implementation…
Kai
  • 953
  • 6
  • 16
  • 37
4
votes
2 answers

Free LSI services or API to get related keywords

I've been told that Yahoo used to had a free LSI service known as Yahoo Boss API that begans to being payed since July 20th and that Microsoft Bing Search Engine have a free service that offers similar but not as good functionalities as Yahoo Boss…
Diosney
  • 10,520
  • 15
  • 66
  • 111
4
votes
3 answers

LSI using gensim in python

I'm using Python's gensim library to do latent semantic indexing. I followed the tutorials on the website, and it works pretty well. Now I'm trying to modify it a bit; I want to be run the lsi model each time a document is added. Here is my…
Jeff
  • 12,147
  • 10
  • 51
  • 87
3
votes
3 answers

Document classification using LSA/SVD

I am trying to do document classification using Support Vector Machines (SVM). The documents I have are collection of emails. I have around 3000 documents to train the SVM classifier and have a test document set of around 700 for which I need…
Ravi
  • 3,223
  • 7
  • 37
  • 49
3
votes
0 answers

What is a "good" value for LSI topic coherence?

I'm using the gensim python library to work on small corpora (around 1500 press articles each time). Let say I'm interested in creating clusters of articles relating the same news. So for each corpus of articles I've tokenized, detected…
fbparis
  • 880
  • 1
  • 10
  • 23
1
2 3 4