0

I am trying to retrieve the source of an image within the tag, I have a snippet of the html code below.

            <img alt="Magellan Outdoors Men's Laguna Madre Solid Short Sleeve Fishing Shirt" src="//assets.academy.com/mgen/81/10762881.jpg?is=500,500" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';">

Basically, over the whole page's html, this line repeats for every different clothing item and within each img tag in the "src", I want to get the image source. The code I have right now in python prints each img tag.

from bs4 import BeautifulSoup as soup
with open('Mens_Shirts.html' ,"r") as menShirts:
    page_soup = soup(menShirts, "lxml")

image = page_soup.findAll("img")

for i in image:
    print(i)

Result:

<img alt="" src="//content.academy.com/aurora/category/2017/clothing/men/fishingshirts-hd.jpg" width="100%"/>
<img alt="Magellan Outdoors Men's Laguna Madre Solid Short Sleeve Fishing Shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="//assets.academy.com/mgen/81/10762881.jpg?is=500,500"/>
<img alt="Magellan Outdoors Men's Laguna Madre Solid Short Sleeve Fishing Shirt" data-blzexdl="1" data-feo-orig-src="//assets.academy.com/mgen/39/10739939.jpg?is=500,500" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="http://1.resources.www.academy.com.edgekey.net/4/W/zhmM8JXG8.webp"/>
<img alt="Rawlings Men's 3/4 Sleeve T-shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="//assets.academy.com/mgen/59/10137459.jpg?is=500,500"/>
<img alt="BCG Men's Turbo Mesh Short Sleeve T-shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="//assets.academy.com/mgen/12/10740412.jpg?is=500,500"/>
<img alt="Nike Men's Elite Back Stripe T-shirt" onerror="this.onerror=null;this.src='//content.academy.com/weblib/images/coming-soon.jpg';" src="//assets.academy.com/mgen/77/10568677.jpg?is=500,500"/>

I tried getting the image source within the "src=" but the codes I tried haven't given the desired output so what would be the best way to extract the image source from the "src="? To be more specific, most of the image sources begin with "//assets.academy.com".

Abhik Nag
  • 37
  • 3

0 Answers0