0

Is there a way to find if a customer review is specifically about a particular subject ? How do i accomplish this using NLP or NLTK ? A customer review of an ecommerce company could talk about how fast/slow the shipping is, how good/bad the product quality is.. etc. Now, If I have to identify categorize the reviews into two categories, how do I achieve that ?

1). Slow shipping 2). Bad quality

Meenakshi
  • 13
  • 4

1 Answers1

0

You would use a technique called topic modelling to extract the hidden topics from your (presumably large) dataset of customer reviews. LDA (Latent Dirichlet Allocation) is an often used algorithm for identifying topics in underlying text.

It might help to keep in mind the following two principles

  • Every document (customer review) is a mixture of topics
  • Every topic is a mixture of words

Sample code (using Gensim, a very widely used Python library for topic modelling)

import gensim
from pprint import pprint

# .. Data preparation code ..

model = gensim.models.ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=10)
pprint(model.print_topics())

The print_topics() above prints the top keywords for each topic (based on their importance). There are alternative ways to do this, as posted by several SO users here.

You may want to refer to this detailed tutorial for a complete code sample.

You may want to refer to this question on topic modelling on hotel reviews.

I hope this helps you.

user799188
  • 13,965
  • 5
  • 35
  • 37