0

I've been searching for a way to ask grep to return a whole line for a matching pattern. Is there functionality for this in R's grep()? I am imagining something like the unix grep arguments -An

Some context: For a paper I've written I want to create a data table or a vector of all citations in a paper. Extracting everything in the paper thats within parentheses using qdapRegex::rm_round() sometimes only returns a year (in the case of citations written like: 'As put forth by Smith (2020)'). It would be nice to grab the whole sentence instead of just '2020'.

Any thoughts? Thank you!

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
milo
  • 25
  • 4
  • Sure, that's fairly straightforward, depending on your input data. Can you share sample data and your expected result? – Skaqqs Jan 15 '22 at 00:13

2 Answers2

2

grep has an argument value which you can set as TRUE to get the whole string back.

Consider this example where you are looking for numbers.

x <- c('This is 2022', 'This is not a year', '2021 was last year')
grep('\\d+', x)
#[1] 1 3

By default grep returns an index where a match is found.

If you need the complete string as an output -

grep('\\d+', x, value = TRUE)
#[1] "This is 2022"       "2021 was last year" 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Great, I will try to use this! I was definitely misunderstanding the function of the value argument. I appreciate the clarification on this. – milo Jan 15 '22 at 22:16
1
s <- c("As put forth by Smith (2020)",
       "As put forth by Smith 2020",
       "As put forth by Smith",
       "As put forth (Smith 2020)")

s[grep(pattern = "\\(.*\\)", x = s)]
#> [1] "As put forth by Smith (2020)" "As put forth (Smith 2020)"

Created on 2022-01-14 by the reprex package (v2.0.1)
Skaqqs
  • 4,010
  • 1
  • 7
  • 21