0

The regex pattern is required to match all instances of a specified word, that does not need to match only individual words but also in-word content.

E.g. searching for media match the instance in mediator.

There are few exceptions. If the word content is within a url or a font-family declaration the match should SKIP. This is the code i came up so far, but i'am missing something as it skips all.

(?:font-family:|https?:\/\/)[^\s\'";}]*(*SKIP)(*FAIL)(media)

The above can be tested at Regex101.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
kole23
  • 55
  • 4
  • Please include a sample of text from which you are trying to extract the matches. – Tim Biegeleisen Oct 25 '19 at 10:43
  • @TimBiegeleisen please see the url – kole23 Oct 25 '19 at 10:44
  • You need a `|` after SKIP FAIL and perhaps also match media on the left side to make sure it is part of the font-family or https? `(?:font-family:\s*|https?:\/\/)[^\s'";}]*\bmedia\b(*SKIP)(*FAIL)|\bmedia\b` See https://regex101.com/r/HCZ473/1 but I am not sure this will cover all the ways you might specify `font-family:` – The fourth bird Oct 25 '19 at 10:47
  • @Thefourthbird thanks so much for your help, obviously i omited the | operator. I come up with this regex https://regex101.com/r/5CX3Ea/3/ but it still catch the "font-family: media" – kole23 Oct 25 '19 at 10:50
  • @kole23 Did you try this pattern? https://regex101.com/r/jAjmUK/1 There are 1+ spaces after font-family: – The fourth bird Oct 25 '19 at 10:54
  • Looks like you want to [skip the stuff inside braces](https://regex101.com/r/5CX3Ea/4). – bobble bubble Oct 25 '19 at 11:21

1 Answers1

0

You could add a pipe after (*SKIP)(*FAIL) and there is a space after font-family:

Note that you don't have to escape the \' in the character class and if you change the delimiter to something else than / you also don't have to escape those. You can also omit the capturing group around media.

(?:font-family:\h*|https?:\/\/)[^\s'";}]*(*SKIP)(*FAIL)|\bmedia\b
  • (?: Non capturing group
    • font-family:\h* Match font:family: and 0+ horizontal whitespace chars
    • | Or
    • https?:\/\/ Match http and optional s
  • ) Close non capturing group
  • [^\s'";}]* Match 0+ times any char except '"; or a whitespace char
  • (*SKIP)(*FAIL) Consume and avoid using SKIP and FAIL
  • | Or
  • \bmedia\b Match media between word boundaries

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70