0

I have problem with creating correct regular expression. Here is what I have so far: https://regex101.com/r/d0epRo/2

I need to add to this links one more parameter and I have to determinate wheather there is question mark or not. Therefore ? should be optional but I can't get it to work.

Those not working (\?|) (\?)? (\??).

Those should be marked http://www.polskieszlaki.pl and http://www.polskieszlaki.pl/wawel.htm but aren't

I have no forther ideas. Help please.

piernik
  • 3,507
  • 3
  • 42
  • 84
  • Why not use [DOMDocument](http://php.net/manual/en/class.domdocument.php) (with DOMXPath perhaps)? – Wiktor Stribiżew Jun 13 '17 at 09:09
  • I guess it's slower – piernik Jun 13 '17 at 09:11
  • At least you would understand what you are doing. Your regex is a mess, and is not doing what you think it is. Please add the real requirements to the question. – Wiktor Stribiżew Jun 13 '17 at 09:12
  • 1
    Obligatory: [**Don't parse HTML with regex**](https://stackoverflow.com/a/1732454/1954610) – Tom Lord Jun 13 '17 at 09:21
  • Also, note that `[^mailto]` is not doing what you think it is. It is saying "One letter that is not `m`, `a`, `i`, `l`, `t` or `o`". (The description on the right of your link tells you this!) A better approach would be to use `https?`. – Tom Lord Jun 13 '17 at 09:52

2 Answers2

-1

If you are just trying to retrieve the query parameters try:

a[\s]+href="[^mailto][\S]+polskieszlaki\.pl(.*)(?:\?(?<param>.*))\"

You can then extract the param group

Or in a more simpler form without the named + ignored capture groups:

a[\s]+href="[^mailto][\S]+polskieszlaki\.pl(.*)(\?(.*))\"
Eduardo
  • 6,900
  • 17
  • 77
  • 121
-1

I think what you want is this regex:

a[\s]+href="[^mailto][\S]+polskieszlaki\.pl(?:(.*))?(?:(\?)(.*))?\"

This (?: ... ) means "do not capture"