1

I'm new to coding and web-scraping,teaching myself with videos and tutorials, I'm attempting to retrieve the picture of a sudoku from an HTML with a Python notebook. i get all the way inside the tags to where the png is, but I don't know what to call to return it as a png in Python

I'm using Python 3.6.5

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

req = Request('http://dailysudoku.com/sudoku/archive/2019/08/2019-08-28.shtml', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
plain_text= BeautifulSoup(webpage, 'html.parser')
table= plain_text.find('table', id='mainLayout')
for column in (table.find_all('td',id="centerTd")):
    for column in(column.find('center')):
       print(column)

That's as far as I can get, which shows that one of the columns is

< img alt="" src="/sudoku/png/2019/08/2019-08-28.png"/>

and i attempted to get it by doing

    column.find_all('img',src="/sudoku/png/2019/08/2019-08-28.png")

but img is non iterable.

Any help is much appreciated, Thanks!

1 Answers1

1

You can select directly the center tag and extract the img like this:

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

req = Request('http://dailysudoku.com/sudoku/archive/2019/08/2019-08-28.shtml', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
soup = BeautifulSoup(webpage, 'html.parser')

img_url = 'http://dailysudoku.com' + soup.select_one('center > img')['src'].replace('\n', '') #strip new lines from tag

print(img_url)
#http://dailysudoku.com/sudoku/png/2019/08/2019-08-28.png

To display directly inside a Jupyter notebook, you can add this:

from IPython.display import Image
Image(url=img_url)
drec4s
  • 7,946
  • 8
  • 33
  • 54
  • Thank you so much @drec4s , thats exactly what i needed. as an extra, not topic related question and if its not much trouble, could you explain the steps that line of code goes through?, why dont you have to go tag by tag, how come you dont need to Access the main layout table – Pablo De Juan Aug 29 '19 at 10:09
  • That's the advantage that these XML/HTML parsing libraries (ie. BeautifulSoup) offer. You can directly query the structure of the file for the element you want. I recommend, however, that you read more about CSS selectors as they can simplify these searches a lot (https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors) – drec4s Aug 29 '19 at 10:16