0

I want to retrieve the first Numbers (here -> 344002) from a string:

string <- '<a href="/Archiv-Suche/!344002&amp;s=&amp;SuchRahmen=Print/" ratiourl-ressource="344002"'

I am preferably looking for a regular expression, which looks for the Numbers after the ! and before the &amp.

All I came up with is this but this catches the ! as well (!344002):

regmatches(string, gregexpr("\\!([[:digit:]]+)", string, perl =TRUE))

Any ideas?

thelatemail
  • 91,185
  • 12
  • 128
  • 188

3 Answers3

2

Use this regex:

(?<=\!)\d+(?=&amp)

Use this code:

regmatches(string, gregexpr("(?<=\!)\d+(?=&amp)", string, perl=TRUE))
  • (?<=\!) is a lookbehind, the match will start following !
  • \d+ matches one digit or more
  • (?=&amp) stops the match if next characters are &amp
Nicolas
  • 6,611
  • 3
  • 29
  • 73
  • 1
    You need to double escape like \\d+ and I don't think you need to escape the ! at all `regmatches(string, gregexpr("(?<=!)\\d+(?=&amp)", string, perl=TRUE))` for instance. But +1. – thelatemail Nov 14 '16 at 01:44
  • Thank you for the quick help! –  Nov 14 '16 at 10:39
0
library(gsubfn)
strapplyc(string, "!(\\d+)")[[1]]

Old answer]

Test this code.

library(stringr)
str_extract(string, "[0-9]+")

similar question&answer is present here

Extract a regular expression match in R version 2.10

Community
  • 1
  • 1
JKim
  • 135
  • 2
  • 7
0

You may capture the digits (\d+) in between ! and &amp and get it with regexec/regmatches:

> string <- '<a href="/Archiv-Suche/!344002&amp;s=&amp;SuchRahmen=Print/" ratiourl-ressource="344002"'
> pattern = "!(\\d+)&amp;"
> res <- unlist(regmatches(string,regexec(pattern,string)))
> res[2]
[1] "344002"

See the online R demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563