2

I am learning how to analyze data sets using R, but I got stuck in the process of interpreting what different factors (category_id, see in the picture) meant.

Basically "one" is a set that has a variable called "title" How the data set looks like

As you can notice, the values in "title" have many characters each one, such as "The Greates Showman"

What I would like to do is to know what is the most frequent word in the entire "title" variable

Alan Wallace
  • 31
  • 1
  • 3
  • 1
    I'll give you a hint: You might want to use the `dplyr` package, and you might want to `group_by` your factor column, and `summarize` by counting occurrences using `n()` – Mako212 Feb 05 '18 at 23:10
  • 4
    `table(vector)` – Onyambu Feb 05 '18 at 23:11
  • 4
    Please, provide a reproducible example. This thread helps on how to do that: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Georgery Feb 05 '18 at 23:12

3 Answers3

4

Use 'Mode()' function from 'DescTools' package.

Mode(x, na.rm = FALSE)

For example, if you have a vector:

> vec = c("Apple", "Apple", "Apple", "Apple", "Ball", "Ball", "Ball", "Cat")
> Mode(vec)
[1] "Apple"
attr(,"freq")
[1] 4

or simply,

> Mode(vec)[1]
[1] "Apple"
Abdul Basit Khan
  • 646
  • 1
  • 6
  • 19
3
vec=c("A","B","A","C","B","B")
# Find most frequent word
names(table(vec))[as.vector(table(vec))==max(table(vec))]
# Find occurences of most frequent word
max(table(vec))
#See frequency table of all words
table(vec)
Antonios
  • 1,919
  • 1
  • 11
  • 18
  • One addendum: `which.max()` returns only the 1st maximum. So, this is incomplete, if there is another `"A"` in your example vector. – Georgery Feb 05 '18 at 23:21
-1

This is hard to answer without giving us the structure of your data frame. I don't even know what you mean by 'word' and I don't see why the fact that you've selected a category id is relevant. Regardless, if there is a column called word and you want to find the most common occurrence in this column, you can use table to work out counts for each unique entry in the column word. From there just dig out the table heading with the highest count.

freq <- table(one$word)                        #Work out counts for each word
maxFreq <- which.max(freq)                     #Find what the maximum count is
mostCommonWord <- names(freq)[freq == maxFreq] #Find all matches of the maximum value
LachlanO
  • 1,152
  • 8
  • 14