0

I need to access a website using Urllib and then search this website for all images that are on that web page. I believe I have successfully written code to access the website, I just need to search this website now.

I will be able to create the regular expression, but I need assistance on how an image would appear in HTML format so I know how to create a regular expression to search for this image.

The code I have posted does not include the regular expression, as i have not made it yet, I just included it because. Just looking for a little guidance. Thanks for all the help!

    import urllib.request
    import ssl

    website = 'https://www.google.com'

    html = urllib.request.urlopen(website)
    for line in html:
        print(line)
KeysMcGee
  • 17
  • 2
  • Check out BeautifulSoup. ref: https://stackoverflow.com/search?q=%5Bpython-3.x%5D+BeautifulSoup – Life is complex Mar 31 '19 at 20:53
  • https://www.w3schools.com/html/html_images.asp – ForceBru Mar 31 '19 at 20:53
  • 1
    I don't know what you mean by "how an image would appear in HTML format". But you wouldn't be trying to [parse HTML with regex](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) anyway, surely? – Daniel Roseman Mar 31 '19 at 20:54
  • Can you please be more specific, because you previous question was very similar to this one? https://stackoverflow.com/questions/55369233/urllib-request-post-data-should-be-bytes-an-iterable-of-bytes-or-a-file-objec – Life is complex Mar 31 '19 at 21:44

1 Answers1

2
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
for img in soup.find_all('img'):
  print img

See https://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start.