3

I have a page which has a tag

<img alt="1ee7aca0cf5b0132dd7a005056a9545d" src="http://assets.amuniversal.com/1ee7aca0cf5b0132dd7a005056a9545d">

I know the XPath -

//*[@id="content"]/div[2]/p/a/img

How do I access that tag and get the src of that tag using BeautifulSoup?

Ninjinx
  • 625
  • 2
  • 7
  • 13

2 Answers2

9

You can try to convert your xpath expression into CSS selector expression, and then use BeautifulSoup select() method which accept CSS selector expression parameter :

soup = BeautifulSoup("your html source")
result = soup.select("#content > div:nth-of-type(2) > p > a > img")
har07
  • 88,338
  • 12
  • 84
  • 137
  • 1
    this doesnt work :/ I get TypeError: 'NoneType' object is not callable – Ninjinx Jun 04 '15 at 08:34
  • 1
    There's nothing can trigger that exception in this answer. However, `result` can be `None` if the html source doesn't contain element that satisfy the selector.. – har07 Jun 04 '15 at 09:02
  • To cross check, try to save `soup` object to file and see if the file contains the expected element. You can't cross check by inspecting element in browser as they maybe different. Some element might be generated by javascript (bs can't execute js while your browser surely can) – har07 Jun 04 '15 at 09:03
2

Since you are already familiar with xpath, why don't you use lxml parser, you can find elements using xpath directly, here is a function that does just that:

from lxml import html
def find_by_xpath(element_source,xpath_expression):
    root = html.fromstring(element_source)
    return root.xpath(xpath_expression)
Gustavo
  • 668
  • 13
  • 24
SEDaradji
  • 1,348
  • 15
  • 18