0

Let's say I have the following dataset, and want to fill in the column "freq" based on the frequency of the word in the column "word".

#df
 word      freq
 a                       
 um
 yeah
 I'm
 no
 a

The outcome will look like this:

word      freq
a          2 
um         1
yeah       1
I'm        1
no         1
a          2

How should I code on R ?

2 Answers2

2

Using dplyr, you can do something like this;

library(dplyr)
df <- tibble(word =c("a", "um", "yeah", "I'm", "no", "a")) 

df %>% 
  group_by(word) %>% 
  add_tally(name = "freq") %>% 
  ungroup()
tombear1
  • 71
  • 3
  • Great ! I tried to make a new column and then fill it in, but this creates many redundant columns... Any idea ? df$freq <- df %>% group_by(word) %>% add_tally(name = "freq") %>% ungroup() – aoooooiiiiiiiiiiiiiii Jan 26 '23 at 22:20
  • df<- df %>% group_by(word) %>% add_tally(name = "freq") %>% ungroup() You just need to pass the dataframe into the pipe, not the column "freq" – tombear1 Jan 31 '23 at 14:31
0

To count for each one, including repeated values in each observation, you could try summing by row and using sapply with stringr::str_count (and paste0 for the word boundary)

df$count <- rowSums(sapply(df$word, function(x) 
                             stringr::str_count(df$word, paste0("\\b", x,"\\b"))))
#  word freq count
#1    a    2     2
#2   um    1     1
#3 yeah    1     1
#4  I'm    1     1
#5   no    1     1
#6    a    2     2

Data

df <- read.table(text = "word      freq
a          2 
um         1
yeah       1
I'm        1
no         1
a          2", header = TRUE)

Note if your data were in a simple vector of strings, defined as str, you would do:

rowSums(sapply(str, function(x) stringr::str_count(str, paste0("\\b", x,"\\b"))))
jpsmith
  • 11,023
  • 5
  • 15
  • 36