0

I'm kinda lost with Regex and would appreciate some help.

Target: To extract the URL between the two " ", without returning the " themselves.

Base string:

<a href="somerandomurl" class="btn btn-xs btn-default "><span class="fa fa-eye fa-fw poptip" data-toggle="tooltip" title="" data-original-title="Inspect in-game"></span></a>

I came up with the following solution:

(="(.*)" class="btn btn-xs btn-default ")

Too bad it is matching

="somerandomurl" class="btn btn-xs btn-default "

Is it possible to match only the inner result, without the delimiters?

somerandomurl

Since this should be included in a script that should run as fast as possible, maybe there is a faster and better approach? In reality this regex search will be applied on a complete website.

l4m0r
  • 23
  • 8
  • 4
    Best not to try to parse HTML with regex. What language? Use an HTML parser instead – CertainPerformance Mar 08 '20 at 10:30
  • 2
    Use a [DOMParser](https://developer.mozilla.org/en-US/docs/Web/API/DOMParser) with for example `document.querySelectorAll("a.btn.btn-xs.btn-default");` and get the `href` – The fourth bird Mar 08 '20 at 10:51
  • 1
    What language/tool are you using? From the [regex tag info](https://stackoverflow.com/tags/regex/info): "Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool." – Toto Mar 08 '20 at 11:27
  • [Parsing HTML with regex is a hard job](https://stackoverflow.com/a/4234491/372239) HTML and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable. – Toto Mar 08 '20 at 11:28
  • Look at the answers to this similar question: https://stackoverflow.com/questions/1454913/regular-expression-to-find-a-string-included-between-two-characters-while-exclud?rq=1 – Poul Bak Mar 08 '20 at 12:38
  • I saw it, but i wasnt able to translate it to my problem – l4m0r Mar 08 '20 at 19:51

1 Answers1

1

Using RegEx to match markup is usually not a good idea. If you have the option you might want prefer a HTML / DOM parser.

That said your RegEx should match the sample in most languages. But it defines two sets of parenthesis so the result you want is located in group 2. Both group 0 and 1 will hold the full match.

If you have trouble reading the correct result group, please provide some additional information like which language your're working in and preferabbly a snippet.

Roland Kreuzer
  • 912
  • 4
  • 11
  • I used Cheerio first (a dom parser) but noticed it adds 30 ms delay/computing time, whereas regex adds only 2 ms delay/computing time. Too bad I'm so bad with its syntax :< In my usecase every ms matters – l4m0r Mar 08 '20 at 19:50