How to check if a string includes a specific word in R

Question

I have the simpsons data from kaggle.com which includes titles of each episode. I want to check how many times the character names have been used in each title. I can find the exact words in titles but my code is missing out the words such as Homers when I look for Homer. Is there a way to do it?

Data example and my code:

text <- 'title
Homer\'s Night Out
Krusty Gets Busted
Bart Gets an "F"
Two Cars in Every Garage and Three Eyes on Every Fish
Dead Putting Society
Bart the Daredevil
Bart Gets Hit by a Car
Homer vs. Lisa and the 8th Commandment
Oh Brother, Where Art Thou?
Old Money
Lisa\'s Substitute
Blood Feud
Mr. Lisa Goes to Washington
Bart the Murderer
Like Father, Like Clown
Saturdays of Thunder
Burns Verkaufen der Kraftwerk
Radio Bart
Bart the Lover
Separate Vocations
Colonel Homer'

simpsons <- read.csv(text = text, stringsAsFactors = FALSE)

library(stringr)

titlewords <- paste(simpsons$title, collapse = " " )
words <- c('Homer')
titlewords <- gsub("[[:punct:]]", "", titlewords)
HomerCount <- str_count(titlewords, paste(words, collapse=" "))
HomerCount

Possible duplicate of [Selecting rows where a column has a string like 'hsa..' (partial string match)](http://stackoverflow.com/questions/13043928/selecting-rows-where-a-column-has-a-string-like-hsa-partial-string-match) — Sam Firke, Nov 13 '16 at 20:56
And `sapply(gregexpr("Homer", simpsons$title), function(x) sum(x > 0))` for the count per string. — Rich Scriven, Nov 13 '16 at 21:00
Is it possible to get in which string Homer is used? Rich's answer gives me a table with 1 and 0's but as I have 600 lines I don't know which lines they are in the list. I don't know if it is possible to get but that would be great if possible! — Tugrul Uzel, Nov 13 '16 at 21:18

score 0 · Answer 1 · answered Nov 14 '16 at 02:17

In an alternative to the excellent suggestions in the comments, you can also use the tidytext package

library(tidytext)
library(dplyr)

text <- 'title
Homer\'s Night Out
Krusty Gets Busted
Bart Gets an "F"
Two Cars in Every Garage and Three Eyes on Every Fish
Dead Putting Society
Bart the Daredevil
Bart Gets Hit by a Car
Homer vs. Lisa and the 8th Commandment
Oh Brother, Where Art Thou?
Old Money
Lisa\'s Substitute
Blood Feud
Mr. Lisa Goes to Washington
Bart the Murderer
Like Father, Like Clown
Saturdays of Thunder
Burns Verkaufen der Kraftwerk
Radio Bart
Bart the Lover
Separate Vocations
Colonel Homer'

simpsons <- read.csv(text = text, stringsAsFactors = FALSE)

# Number of homers
simpsons %>%
  unnest_tokens(word, title) %>% 
  summarize(count = sum(grepl("homer", word)))

# Lines location of homers
simpsons %>% 
  unnest_tokens(word, title) %>% 
  mutate(lines = rownames(.)) %>% 
  filter(grepl("homer", word))

How to check if a string includes a specific word in R

1 Answers1