How to find the most repeated word in a vector with R

Question

I am learning how to analyze data sets using R, but I got stuck in the process of interpreting what different factors (category_id, see in the picture) meant.

Basically "one" is a set that has a variable called "title" How the data set looks like

As you can notice, the values in "title" have many characters each one, such as "The Greates Showman"

What I would like to do is to know what is the most frequent word in the entire "title" variable

I'll give you a hint: You might want to use the `dplyr` package, and you might want to `group_by` your factor column, and `summarize` by counting occurrences using `n()` — Mako212, Feb 05 '18 at 23:10
Please, provide a reproducible example. This thread helps on how to do that: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Georgery, Feb 05 '18 at 23:12

score 4 · Answer 1 · answered Feb 23 '21 at 13:41

Use 'Mode()' function from 'DescTools' package.

Mode(x, na.rm = FALSE)

For example, if you have a vector:

> vec = c("Apple", "Apple", "Apple", "Apple", "Ball", "Ball", "Ball", "Cat")
> Mode(vec)
[1] "Apple"
attr(,"freq")
[1] 4

or simply,

> Mode(vec)[1]
[1] "Apple"

Antonios · Answer 2 · 2018-02-05T23:33:52.693

3

vec=c("A","B","A","C","B","B")
# Find most frequent word
names(table(vec))[as.vector(table(vec))==max(table(vec))]
# Find occurences of most frequent word
max(table(vec))
#See frequency table of all words
table(vec)

edited Feb 05 '18 at 23:33

answered Feb 05 '18 at 23:17

Antonios

1,919
1
11
18

One addendum: `which.max()` returns only the 1st maximum. So, this is incomplete, if there is another `"A"` in your example vector. – Georgery Feb 05 '18 at 23:21

LachlanO · Answer 3 · 2018-02-05T23:29:48.867

This is hard to answer without giving us the structure of your data frame. I don't even know what you mean by 'word' and I don't see why the fact that you've selected a category id is relevant. Regardless, if there is a column called word and you want to find the most common occurrence in this column, you can use table to work out counts for each unique entry in the column word. From there just dig out the table heading with the highest count.

freq <- table(one$word)                        #Work out counts for each word
maxFreq <- which.max(freq)                     #Find what the maximum count is
mostCommonWord <- names(freq)[freq == maxFreq] #Find all matches of the maximum value

How to find the most repeated word in a vector with R

3 Answers3

Linked