The below is my code. It attempts to get the src of an image within an image tag in html.
import re
for text in open('site.html'):
matches = re.findall(r'\ssrc="([^"]+)"', text)
matches = ' '.join(matches)
print(matches)
problem is when i put in something like:
<img src="asdfasdf">
It works but when i put in an ENTIRE HTML page it returns nothing. Why does it do that? and how do i fix it?
Site.html is just the html code for a website in standard format. I want it to ignore everything and just print the source code for the image. If you would like to see what would be inside site.html then go to a basic HTML webpage and copy all the source code.