1
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

start_url = 'https://www.example.com'
downloaded_html = requests.get(start_url)
soup = BeautifulSoup(downloaded_html.text, "lxml")
full_header = soup.select('div.reference-image')
full_header

The Output of the above code is;

[<div class="reference-image"><img src="Content/image/all/reference/c101.jpg"/></div>,
 <div class="reference-image"><img src="Content/image/all/reference/c102.jpg"/></div>,
 <div class="reference-image"><img src="Content/image/all/reference/c102.jpg"/></div>]

I would like to extract the img src content as below;

["Content/image/all/reference/c101.jpg",
 "Content/image/all/reference/c102.jpg",
 "Content/image/all/reference/c102.jpg"]

How can I extract it?

  • https://stackoverflow.com/questions/28212766/extract-string-from-tag-with-beautifulsoup – fernand0 Apr 15 '20 at 16:06
  • Can you clarify what exactly the issue is? Please see [ask], [help/on-topic]. – AMC Apr 15 '20 at 17:39
  • This is a blatant duplicate of [Extracting an attribute value with beautifulsoup](https://stackoverflow.com/questions/2612548/extracting-an-attribute-value-with-beautifulsoup). – AMC Apr 15 '20 at 17:41
  • Thank you @JoshuaVarghese, it perfectly does. Also, is there a way to handle this situation with BeautifulSoup's own functionality? – Murat Dikici Apr 20 '20 at 12:59

1 Answers1

2

To get that, just iterate through the result:

img_srcs = []
for i in full_header:
    img_srcs.append(i.find('img')['src'])

This gives:

['Content/image/all/reference/c101.jpg', 'Content/image/all/reference/c102.jpg', 'Content/image/all/reference/c102.jpg']

Here is a one-liner for this:

img_srcs = [i.find('img')['src'] for i in full_header]
Joshua Varghese
  • 5,082
  • 1
  • 13
  • 34