Regex to find all URLs that contains "logo" ending with image extension

Question

I'm trying to find logos on websites.

XPath (//img[contains(@src,"logo")]/@src) works if the logo is inside but there are websites that have their logo defined in style:

<html>
   <head>
      <style>
         someclass {
            background-image: url("/css/images/logo2.jpg");"
            background-color: #cccccc;
         }
      </style>
   </head>
   
   <body>
      <h1>Hello World!</h1>
   </body>
<html>

So I'm trying to build a regex for such cases:

[\"\']([\a-zA-Z0-9-_]*logo[a-zA-Z0-9\-_]*\.(?:png|jpg|jpeg)).*?"

This, for example, is capturing "/e/logo_adsada.jpg?size=400" but also next characters.

Here is the example:

https://regex101.com/r/rV3oP8/160

Do you know what is wrong?

`
the image can be found at "/e/logo.jpg"
` <-- You'll match this but shouldn't. Good luck parsing HTML/CSS with RegEx. — Cid, Aug 28 '20 at 11:29

score 0 · Answer 1 · answered Aug 28 '20 at 11:46

I believe your issue is greediness. This is usually a flag in regex engines. On the website you posted you could activate the "Ungreedy" flag.

Quoting a quote from another question, who is quoting Regular expression:

The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex.

By using a lazy quantifier, the expression tries the minimal match first.

score 0 · Answer 2 · answered Aug 28 '20 at 13:42

0

Below regex will help

["']([\a-zA-Z0-9-_]*?logo[a-zA-Z0-9\-_]*?\.(?:png|jpg|jpeg)).*?['"]

Regex demo

answered Aug 28 '20 at 13:42

Liju

2,273
3
6
21

Regex to find all URLs that contains "logo" ending with image extension

2 Answers2