I'm trying to find logos on websites.
XPath (//img[contains(@src,"logo")]/@src
) works if the logo is inside but there are websites that have their logo defined in style:
<html>
<head>
<style>
someclass {
background-image: url("/css/images/logo2.jpg");"
background-color: #cccccc;
}
</style>
</head>
<body>
<h1>Hello World!</h1>
</body>
<html>
So I'm trying to build a regex for such cases:
[\"\']([\a-zA-Z0-9-_]*logo[a-zA-Z0-9\-_]*\.(?:png|jpg|jpeg)).*?"
This, for example, is capturing "/e/logo_adsada.jpg?size=400"
but also next characters.
Here is the example:
https://regex101.com/r/rV3oP8/160
Do you know what is wrong?