You would use a technique called topic modelling to extract the hidden topics from your (presumably large) dataset of customer reviews. LDA (Latent Dirichlet Allocation) is an often used algorithm for identifying topics in underlying text.
It might help to keep in mind the following two principles
- Every document (customer review) is a mixture of topics
- Every topic is a mixture of words
Sample code (using Gensim, a very widely used Python library for topic modelling)
import gensim
from pprint import pprint
# .. Data preparation code ..
model = gensim.models.ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=10)
pprint(model.print_topics())
The print_topics()
above prints the top keywords for each topic (based on their importance). There are alternative ways to do this, as posted by several SO users here.
You may want to refer to this detailed tutorial for a complete code sample.
You may want to refer to this question on topic modelling on hotel reviews.
I hope this helps you.