0

I have a data frame with 1 variable and 5,000 rows, where each element is a string.

1. "Am open about my feelings."                   
2. "Take charge."                                 
3. "Talk to a lot of different people at parties."
4. "Make friends easily."                         
5. "Never at a loss for words."                   
6. "Don't talk a lot."                            
7. "Keep in the background."                      
   .....
5000. "Speak softly."           

I need to find and output row numbers that correspond to 3 specific elements. Currently, I use the following:

grep("Take charge." ,  df[,1]) 
grep("Make friends easily.",  df[,1])  
grep("Make friends easily.",  df[,1])  

And get the following output: [1] 2 [2] 4 [3] 5000

Question 1. Is there a way to make syntax more succinct, so I do not have to use grep and df[,1] on every single line?

Questions 2. If so, how to output a single numerical array of the necessary row positions, so the result would look something like this?

2, 4, 5000

What I tried so far.
grep("Take charge." , "Make friends easily.","Make friends easily.",
df[,1]) # this didn't work

I tried to create a vector, called m1, that contains all three elements and then grep(m1, df[,1]) # this didn't work either

PsychometStats
  • 340
  • 1
  • 7
  • 19
  • Basically a duplicate of [grep using a character vector with multiple patterns](https://stackoverflow.com/q/7597559/903061) or [this](https://stackoverflow.com/q/9537797/903061) (substituting `grepl` for `regexpr`). – Gregor Thomas May 04 '19 at 22:02
  • 1
    Just in case the answers aren't clear, you have options such as `patterns = c("Take charge.", "Make friends easily.")`, an easy way is `which(grepl(paste(patterns, collapse = "|"), df[,1]))`. This is standard regex where `.` matches any single character---if you want to match a literal `"."` escape it in your patterns, e.g., `"Take charge\\."`. – Gregor Thomas May 04 '19 at 22:09
  • But Gabor has a good point in his answer--if these are complete, exact matches then a non-regex solution will be simpler and more efficient. – Gregor Thomas May 04 '19 at 23:03
  • I tried both solutions, for some reason your solution worked perfectly but Gabor's not, maybe I did something wrong though. Anyways thank you for your input! I very much appreciate it! – PsychometStats May 04 '19 at 23:10

1 Answers1

3

Since these are exact matches use this where phrases is a character vector of the phrases you want to match:

match(phrases, df[, 1])

This also works provided no phrase is a substring of another phrase:

grep(phrases, df[, 1])
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • 1
    Since OP isn't clear about uniqueness, probably worth mentioning the difference between `match(prhases, df[, 1])` and `which(df[, 1] %in% phrases)`. – Gregor Thomas May 04 '19 at 22:11