2

I have a dataset (LDA output) that looks like this.

lda_tt <- tidy(ldaOut)

lda_tt <- lda_tt %>%
        group_by(topic) %>%
        top_n(10, beta) %>%
        ungroup() %>%
        arrange(topic, -beta)

    topic   term    beta
1   1   council 0.044069733
2   1   report  0.020086205
3   1   budget  0.016918569
4   1   polici  0.01646605
5   1   term    0.015051927
6   1   annual  0.014938797
7   1   control 0.014316583
8   1   audit   0.013637803
9   1   rate    0.012732765
10  1   fund    0.011997421
11  2   debt    0.033760856
12  2   plan    0.030379431
13  2   term    0.02925229
14  2   fiscal  0.021836885
15  2   polici  0.017802904
16  2   mayor   0.015548621
17  2   transpar0.013175692
18  2   relat   0.012997722
19  2   capit   0.012463813
20  2   long    0.011989227
21  2   remain  0.011989227
22  3   parti   0.031795751
23  3   elect   0.029929187
24  3   govern  0.025496098
25  3   mayor   0.023046232
26  3   district0.014588364
27  3   public  0.014471704
28  3   administr0.013596752
29  3   budget  0.011730188
30  3   polit   0.011730188
31  3   seat    0.010563586
32  3   state   0.010563586
33  4   budget  0.037069484
34  4   revenu  0.025043026
35  4   account 0.018459577
36  4   oper    0.01721546
37  4   tax     0.015867667
38  4   debt    0.014416198
39  4   compani 0.013690464
40  4   expenditur0.012135318
41  4   consolid0.011305907
42  4   increas 0.010891202
43  5   invest  0.026534237
44  5   elect   0.023341538
45  5   administr0.022296654
46  5   improv  0.02189031
47  5   develop 0.019162003
48  5   project 0.017826874
49  5   transport0.016375647
50  5   local   0.016317598
51  5   infrastr0.014401978
52  5   servic  0.014111733

I want to create 5 plots by topic with terms ordered by beta. This is the code

    lda_tt %>%
        mutate(term = reorder(term, beta)) %>%
        ggplot(aes(term, beta, fill = factor(topic))) +
        geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
        facet_wrap(~ topic, scales = "free") +
        coord_flip()

I get this graphTerms by beta As you can see, despite the sorting efforts, the terms are not order by beta, as the term "budget", for example, should be the top term in topic 4, and "invest" at the top of topic 5, etc. How can sort the terms within each topic on each graph? There are several questions on stackoverflow about ggplot sorting, but none of these helped me solve the problem.

Michael
  • 159
  • 1
  • 2
  • 14

1 Answers1

1

The link suggested by Tung provides a solution to the problem. It seems that each term needs to be coded as a distinct factor to get proper sorting. We can add " _ " and the topic number to each term (done in lines 2 and 3), but display only the terms without "_" and the topic number (last line of code takes care of that). The following code generates a faceted graph with proper sorting.

    lda_tt %>%

        mutate(term = factor(paste(term, topic, sep = "_"),
                             levels = rev(paste(term, topic, sep = "_")))) %>%#convert to factor

        ggplot(aes(term, beta, fill = factor(topic))) +
        geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
        facet_wrap(~ topic, scales = "free") +
        coord_flip() + 

        scale_x_discrete(labels = function(x) gsub("_.+$", "", x)) #remove "_" and topic number
Michael
  • 159
  • 1
  • 2
  • 14