0

I have an HTML page in an iframe that is provided by a user. I want to have a list of all urls referenced in this HTML page anywhere. This includes links in CSS files and in style attributes.

For example, running it on this code:

<div>
    <style>
        ul {
            background: url("exampleImage.png") #00D no-repeat fixed;
        }
    </style>
    <ul style="list-style: square url(http://www.example.com/redball.png);">
        <li><a href="http://www.example.com/foobar">test</a></li>
    </ul>
</div>

should return these urls:

exampleImage.png
http://www.example.com/redball.png
http://www.example.com/foobar
Florian Dietz
  • 877
  • 9
  • 20
  • You should use RegEx to do that kind of tasks https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285 – Jacob-Jan Mosselman Aug 23 '18 at 17:16
  • That sounds unsafe. It will work most of the time, but how can you rule out special cases? What if the link doesn't start with http or https? What if something looks like a link, but it's actually fine because it's quoted? How could I be sure I didn't miss any possible special case? – Florian Dietz Aug 23 '18 at 19:06

1 Answers1

0

Sounds like a great opportunity to regex:

var re = /(https?:.*?)[\)"]/g
var s = document.body.innerHTML // here goes your html element
var m

do {
    m = re.exec(s);
    if (m) {
        console.log(m[1], m[2]);
    }
} while (m);

Thanks to lawnsea

Facundo Petre
  • 281
  • 2
  • 5
  • 1
    That sounds unsafe. It will work most of the time, but how can you rule out special cases? What if the link doesn't start with http or https? What if something looks like a link, but it's actually fine because it's quoted? How could I be sure I didn't miss any possible special case? – Florian Dietz Aug 23 '18 at 19:06
  • Tell us more about your special cases and we'll try to cover all them on another regex. Remember, regex just take matching patterns, so if you'd like to cover all your cases you will need at least one example of each possible pattern of yours url. – Facundo Petre Aug 24 '18 at 02:47