0

I am working on a text analysis project in R and want to search for certain words in my response data.

Essentially I am trying to write this:

df$word1 <- ifelse(grepl("word1", df$responses), 1, 0)

Where df$word1 is a new column in df, "word1" is the pattern to search for in the column of responses (df$responses).

This works fine for just one word, but I have ~250 patterns I want to individually search for, is there a way to do this without having to manually write code for each? I appreciate any help I can get!

Maël
  • 45,206
  • 3
  • 29
  • 67
crowjo
  • 1
  • Hi and welcome! Could you edit your question to include a more complete example (say, with three words) and include reproducible data (ie, `dput(df[15,])`) and expected output? I am unclear what your input data or expected output would look like (i.e. is `df$responses` sentences? single words?. Folks here are happy to help, they just want to make sure their effort it put towards the correct application of the code. Tou can find some tips on how to make your question reproducible [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Good luck! – jpsmith Jul 11 '23 at 20:47

2 Answers2

2

You can do this in one line with lapply, and you dont need an ifelse statement. Since grepl returns boolean, a mathematical function + will convert to ones and 0's. And you need to wrap your words in a word boundary so things like "there" and "here" aren't both considered in grep("here", df$responses):

Data

df <- data.frame(responses = c("there", "are", "single", "words", "here"))

wordsearch <- c("there", "are", "words", "here")
df[wordsearch] <- lapply(wordsearch, 
                         function(x) df[x] <- +grepl(paste0("\\b",x,"\\b"), df$responses))

Output:

  responses there are words here
1     there     1   0     0    0
2       are     0   1     0    0
3    single     0   0     0    0
4     words     0   0     1    0
5      here     0   0     0    1
jpsmith
  • 11,023
  • 5
  • 15
  • 36
0

You can loop over your list of words and dynamically create new columns for each. Suppose you have a character vector words with all your words:

words <- c("word1", "word2", ..., "word250")  # Replace with your actual words

Then you can use a loop to create new columns:

for (word in words) {
    df[[word]] <- ifelse(grepl(word, df$responses, ignore.case = TRUE), 1, 0)
}

Here, df[[word]] is used to dynamically create a new column for each word in your words vector. The ignore.case = TRUE argument makes the matching case-insensitive. If you want case-sensitive matching, just remove this argument.

This will add 250 new columns to your data frame df, one for each word in your words vector. Each column will be a binary indicator of whether the corresponding word was found in df$responses.

This approach is quite flexible and scalable. If you have more words to add in the future, you can just add them to your words vector and rerun the loop.

jpsmith
  • 11,023
  • 5
  • 15
  • 36