Searching W/ regular expressions via Urllib.request

Question

I need to access a website using Urllib and then search this website for all images that are on that web page. I believe I have successfully written code to access the website, I just need to search this website now.

I will be able to create the regular expression, but I need assistance on how an image would appear in HTML format so I know how to create a regular expression to search for this image.

The code I have posted does not include the regular expression, as i have not made it yet, I just included it because. Just looking for a little guidance. Thanks for all the help!

    import urllib.request
    import ssl

    website = 'https://www.google.com'

    html = urllib.request.urlopen(website)
    for line in html:
        print(line)

Check out BeautifulSoup. ref: https://stackoverflow.com/search?q=%5Bpython-3.x%5D+BeautifulSoup — Life is complex, Mar 31 '19 at 20:53
I don't know what you mean by "how an image would appear in HTML format". But you wouldn't be trying to [parse HTML with regex](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) anyway, surely? — Daniel Roseman, Mar 31 '19 at 20:54
Can you please be more specific, because you previous question was very similar to this one? https://stackoverflow.com/questions/55369233/urllib-request-post-data-should-be-bytes-an-iterable-of-bytes-or-a-file-objec — Life is complex, Mar 31 '19 at 21:44

score 2 · Accepted Answer · answered Mar 31 '19 at 21:13

2

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
for img in soup.find_all('img'):
  print img

See https://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start.

answered Mar 31 '19 at 21:13

Kaue Silveira

170
3

This is an excellent answer. ALSO: read this: https://stackoverflow.com/a/1732454/64004 – gahooa Mar 31 '19 at 21:50

Searching W/ regular expressions via Urllib.request

1 Answers1