0

I'm using R and would like to create an in-text from a full citation using regex. For instance, I have:

Ali, D. A., Deininger, K., & Goldstein, M. (2014). Environmental and gender 
impacts of land tenure regularization in Africa: Pilot evidence from Rwanda. 
Journal of Development Economics, 110, 262–275.

I would like a regex that pulls all information up to the first 4-digit number, including the parathesis. Like this:

Ali, D. A., Deininger, K., & Goldstein, M. (2014)

Any suggestions? Thanks.

elliot
  • 1,844
  • 16
  • 45

1 Answers1

1

We can use sub

sub("^(.*\\(\\d{4}\\)).*", "\\1", txt)
#[1] "Ali, D. A., Deininger, K., & Goldstein, M. (2014)"

Or another approach would be to get the index of the first match where the 4-digit number occurs and then do a substr

i1 <- regexpr("(?<=\\()\\d{4}(?=\\))", txt, perl = TRUE)
substr(txt, seq(i1), i1+ attr(i1, "match.length"))
#[1] "Ali, D. A., Deininger, K., & Goldstein, M. (2014)"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This `[^(]` is not a good idea, akrun. What if there is a `(` before the first (4 digits) ?. – Wiktor Stribiżew Apr 27 '18 at 13:06
  • Now, `"^(.*\\(\\d{4}\\)).*"` will match the *last* occurrence of (4digits) and `"\\d{4}"` does not really match (4digits) but just 4digits. It seems you are overcomplicating the solution, especially the one with `sub`. – Wiktor Stribiżew Apr 27 '18 at 13:18
  • if you are talking about the second solution, I think i didn't made it specific. corrected – akrun Apr 27 '18 at 14:06