R: grep multiple strings at once

Question

I have a data frame with 1 variable and 5,000 rows, where each element is a string.

1. "Am open about my feelings."                   
2. "Take charge."                                 
3. "Talk to a lot of different people at parties."
4. "Make friends easily."                         
5. "Never at a loss for words."                   
6. "Don't talk a lot."                            
7. "Keep in the background."                      
   .....
5000. "Speak softly."

I need to find and output row numbers that correspond to 3 specific elements. Currently, I use the following:

grep("Take charge." ,  df[,1]) 
grep("Make friends easily.",  df[,1])  
grep("Make friends easily.",  df[,1])

And get the following output: [1] 2 [2] 4 [3] 5000

Question 1. Is there a way to make syntax more succinct, so I do not have to use grep and df[,1] on every single line?

Questions 2. If so, how to output a single numerical array of the necessary row positions, so the result would look something like this?

2, 4, 5000

What I tried so far.
grep("Take charge." , "Make friends easily.","Make friends easily.",
df[,1]) # this didn't work

I tried to create a vector, called m1, that contains all three elements and then grep(m1, df[,1]) # this didn't work either

Basically a duplicate of [grep using a character vector with multiple patterns](https://stackoverflow.com/q/7597559/903061) or [this](https://stackoverflow.com/q/9537797/903061) (substituting `grepl` for `regexpr`). — Gregor Thomas, May 04 '19 at 22:02
Just in case the answers aren't clear, you have options such as `patterns = c("Take charge.", "Make friends easily.")`, an easy way is `which(grepl(paste(patterns, collapse = "|"), df[,1]))`. This is standard regex where `.` matches any single character---if you want to match a literal `"."` escape it in your patterns, e.g., `"Take charge\\."`. — Gregor Thomas, May 04 '19 at 22:09
But Gabor has a good point in his answer--if these are complete, exact matches then a non-regex solution will be simpler and more efficient. — Gregor Thomas, May 04 '19 at 23:03
I tried both solutions, for some reason your solution worked perfectly but Gabor's not, maybe I did something wrong though. Anyways thank you for your input! I very much appreciate it! — PsychometStats, May 04 '19 at 23:10

G. Grothendieck · Accepted Answer · 2019-05-04T22:13:03.153

3

Since these are exact matches use this where phrases is a character vector of the phrases you want to match:

match(phrases, df[, 1])

This also works provided no phrase is a substring of another phrase:

grep(phrases, df[, 1])

edited May 04 '19 at 22:13

answered May 04 '19 at 22:09

G. Grothendieck

254,981
17
203
341

1

Since OP isn't clear about uniqueness, probably worth mentioning the difference between `match(prhases, df[, 1])` and `which(df[, 1] %in% phrases)`. – Gregor Thomas May 04 '19 at 22:11

R: grep multiple strings at once

1 Answers1