I'm trying my hand on Scala regex to find img
src
in a web page.
Using the following code and a mock content, I don't get any match. What am I missing?
def imgSrc(content: String) = {
val src = ".*<img[\\w\\s]+src\\s*=\\s*(\"\\w+\")[\\w\\s]+/>.*".r
val formattedContent = content.replaceAll(lineSeparator, "")
(src findAllIn formattedContent).toList
}
Test case:
"Method imgSrc" should "find src attributes of all img tags in mock web page" in {
val content = """<a href="#search" onclick="_gaq.push(['_trackPageview', '/search']);
return Manager.createHistoryAndLoad(true);">
<img src="ajaxsolr/images/centralRepository_logo.png" alt="The Central Repository" />
</a>"""
imgSrc(content) should contain("ajaxsolr/images/centralRepository_logo.png")
}
Also, it'd be nice to be able to match the multiline input without removing the newlines. I read this and this but couldn't get it to work.
Note: This is just a learning exercise. I'm aware and generally agree that one shouldn't use regex to parse HTML.