Evaluation of topic modeling: How to understand a coherence value / c_v of 0.4, is it good or bad?

Question

I need to know whether coherence score of 0.4 is good or bad? I use LDA as topic modelling algorithm.

What is the average coherence score in this context?

score 55 · Answer 1 · edited Apr 05 '20 at 12:07

Coherence measures the relative distance between words within a topic. There are two major types C_V typically 0 < x < 1 and uMass -14 < x < 14. It's rare to see a coherence of 1 or +.9 unless the words being measured are either identical words or bigrams. Like United and States would likely return a coherence score of ~.94 or hero and hero would return a coherence of 1. The overall coherence score of a topic is the average of the distances between words. I try and attain a .7 in my LDAs if I'm using c_v I think that is a strong topic correlation. I would say:

.3 is bad

.4 is low

.55 is okay

.65 might be as good as it is going to get

.7 is nice

.8 is unlikely and

.9 is probably wrong

Low coherence fixes:

adjust your parameters alpha = .1, beta = .01 or .001, random_state = 123, etc
get better data
at .4 you probably have the wrong number of topics check out https://datascienceplus.com/evaluation-of-topic-modeling-topic-coherence/ for what is known as the elbow method - it gives you a graph of the optimal number of topics for greatest coherence in your data set. I'm using mallet which has pretty good coherance here is code to check coherence for different numbers of topics:

def compute_coherence_values(dictionary, corpus, texts, limit, start=2, step=3):
    """
    Compute c_v coherence for various number of topics

    Parameters:
    ----------
    dictionary : Gensim dictionary
    corpus : Gensim corpus
    texts : List of input texts
    limit : Max num of topics

    Returns:
    -------
    model_list : List of LDA topic models
    coherence_values : Coherence values corresponding to the LDA model with respective number of topics
    """
    coherence_values = []
    model_list = []
    for num_topics in range(start, limit, step):
        model = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=num_topics, id2word=id2word)
        model_list.append(model)
        coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
        coherence_values.append(coherencemodel.get_coherence())

    return model_list, coherence_values
# Can take a long time to run.
model_list, coherence_values = compute_coherence_values(dictionary=id2word, corpus=corpus, texts=data_lemmatized, start=2, limit=40, step=6)
# Show graph
limit=40; start=2; step=6;
x = range(start, limit, step)
plt.plot(x, coherence_values)
plt.xlabel("Num Topics")
plt.ylabel("Coherence score")
plt.legend(("coherence_values"), loc='best')
plt.show()

# Print the coherence scores
for m, cv in zip(x, coherence_values):
    print("Num Topics =", m, " has Coherence Value of", round(cv, 4))
    
# Select the model and print the topics
optimal_model = model_list[3]
model_topics = optimal_model.show_topics(formatted=False)
pprint(optimal_model.print_topics(num_words=10))

I hope this helps :)

any idea what the alpha, beta parameters you mention correspond to in Gensim's lda model? (https://radimrehurek.com/gensim/models/ldamodel.html) — Vincent, Sep 04 '19 at 17:15
@Vincent Alpha is Dirichlet-prior concentration parameter of the per-document topic distribution where as Beta is the same parameter of the per-topic word distribution. Please refer this link. https://www.thoughtvector.io/blog/lda-alpha-and-beta-parameters-the-intuition/ — SVK, Dec 04 '19 at 15:03
Can you suggest a paper where the scores and levels you've provided are set in experiments? — seeiespi, Dec 12 '19 at 15:03

score 4 · Answer 2 · answered Jul 03 '20 at 23:00

In addition to the excellent answer from Sara:

UMass coherence measure how often were the two words (Wi, Wj) were seen together in the corpus. It is defined as:

D(Wi, Wj) = log [ (D(Wi, Wj) + EPSILON) / D(Wi) ]

Where: D(Wi, Wj) is how many times word Wi and word Wj appeared together

D(Wi) is how many times word Wi appeared alone in the corpus

EPSILON is a small value (like 1e-12) added to the numerator to avoid 0 values

If Wi and Wj never appear together, then this results in log(0) which will break the universe. EPSILON value is kind-of a hack to fix this.

In conclusion, you can get a value from very big negative number all the way till approx 0. Interpretation is the same as Sara wrote, the greater the number the better, where 0 would be obviously wrong.

score 1 · Answer 3 · answered Sep 03 '20 at 18:55

I would just like to add that good or bad is relative to the corpus you are working on and the scores for the other clusters.

In the link that Sara provided the article shows 33 topics as optimal with a coherence score of ~0.33, but as the author mentions there maybe repeated terms within that cluster. In that case you would have to compare terms/snippets from the optimal cluster decomposition to a lower coherence score to see if the results are more or less interpretable.

Of course you should adjust the parameters of your model but the score contextually dependent, and I don't think you can necessarily say a specific coherence score clustered your data optimally without first understanding what the data looks like. That said, as Sara mentioned ~1 or ~0 are probably wrong.

You could compare your model against a benchmark dataset and if it has a higher coherence, then you have a better gauge of how well your model is working.

This paper was helpful to me: https://rb.gy/kejxkz

Evaluation of topic modeling: How to understand a coherence value / c_v of 0.4, is it good or bad?

3 Answers3

Linked