I am doing topic modeling but need to remove certain characters. Specifically bullet points remain in my terms list.
USAID_stops <- c("performance", "final", "usaidgov", "kaves", "evaluation", "*", "[[:punct:]]", "U\2022")
#for (i in 1:length(chapters_1)) {
a <- SimpleCorpus(VectorSource(chapters_1[1]))
dtm_questions <- DocumentTermMatrix(a)
report_topics <- LDA(dtm_questions, k = 4)
topic_weights <- tidy(report_topics, matrix = "beta")
top_terms <- topic_weights %>%
group_by(topic) %>%
slice_max(beta, n = 10) %>%
ungroup() %>%
arrange(topic, -beta) %>%
filter(!term %in% stop_words$word) %>%
filter(!term %in% USAID_stops)
topic term beta
<int> <chr> <dbl>
1 chain 0.009267748
2 • 0.009766040
2 chain 0.009593995
2 change 0.008294549
3 nutrition 0.017117040
3 related 0.009621772
3 strategy 0.008523203
4 • 0.021312755
4 chain 0.010974153
4 ftf 0.008146484
These remain. How and where can I remove them from?