-1

I have a regex which is matching urls which don't have quotes (double or single) at the end and have & (ampersand) in the url.

The regex i made

([^'` "\n]+)\.([^ \n]+)&([^ "`'\n]+)(?!["'])

but it's just not taking the last word and matching the url

https://regex101.com/r/vpmqZH/1

enter image description here Take the example of picture above

google.com/cool?cool1=yes&cool2=no&cool3=no"

the url should not match as it have " in the end

but it's just not matching 'o' and matching the remaining url.

All I wanted to do is if this double quote is present in the end then just don't match the whole url.

Mohit Kumar
  • 552
  • 9
  • 29
  • Could you share us the regex101.com link with your example? Or past the text in plain instead of an image? This would help us find the solution. – Patrick Janser May 09 '22 at 12:40
  • https://regex101.com/r/vpmqZH/1 this is the url – Mohit Kumar May 09 '22 at 12:42
  • Try matching what you do not need and match and *capture* what you need, see ``[^'`"\s]+\.\S+&[^"`'\s]+["']|([^'`"\s]+)\.(\S+)&([^"`'\s]+)`` [demo](https://regex101.com/r/vpmqZH/3). Where no substring is captured, omit those matches. – Wiktor Stribiżew May 09 '22 at 13:06
  • What programming language are you using? We need to code a real example. There are subtle differences between regex in different languages and different tools avaliable. –  May 09 '22 at 13:35

2 Answers2

2

You need to make the lookahead active on the whole part after the ampersand. We then have the option of

  • $ end of the line
    or
  • (?=\s) positive lookahead for a space.
([^' "\n]+)\.([^ \n]+)&((?!["'])[^ "'\n])+($|(?=\s)

See https://regex101.com/r/6ZGpSX/1

  • but this won't match https://regex101.com/r/nCEXVY/1 case – Mohit Kumar May 09 '22 at 12:46
  • you are matching if there is nothing in the end other then \n. after url there can be a space then the other word. (i. e. "hi this is the url {{url}} hi this is url" – Mohit Kumar May 09 '22 at 12:47
  • Then it will match a space also in this case `google.com/cool?cool1=yes&cool2=no&cool3=no momdata` https://regex101.com/r/WmDQKy/1 – Mohit Kumar May 09 '22 at 13:32
  • What programming language are you using? We need to code a real example. There are subtle differences between regex in different languages and different tools avaliable. –  May 09 '22 at 13:35
  • Javascript, `([^'` "\n]+)\.([^ \n]+)&((?!["'])[^ "`'\n])+($|(?=\s)` added positive lookahead and it worked – Mohit Kumar May 09 '22 at 13:41
  • Can you update your answer according to the best. Like if you explain a little bit. It will be easier for others to understand. btw Thanks a lot for helping – Mohit Kumar May 09 '22 at 13:41
1

For a match only, you can omit the capture groups, and use a negated character class and you should omit the backtick ` from the negated character class if you want to allow to match it.

[^'"\s.]+\.[^\s'"&]+&[^\s"']+(?!\S)

Explanation

  • [^'"\s.]+ Match 1+ non whitespace chars other than " ' .
  • \. Match a dot
  • [^\s'"&]+ Match 1+ non whitespace chars other than " ' &
  • & Match literally
  • [^\s"']+ Match 1+ non whitespace chars other than " '
  • (?!\S) Assert a whitespace boundary to the right

See a regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70