Convert XPath to Beautiful Soup

Question

I have a page which has a tag

<img alt="1ee7aca0cf5b0132dd7a005056a9545d" src="http://assets.amuniversal.com/1ee7aca0cf5b0132dd7a005056a9545d">

I know the XPath -

//*[@id="content"]/div[2]/p/a/img

How do I access that tag and get the src of that tag using BeautifulSoup?

score 9 · Answer 1 · answered Jun 04 '15 at 07:09

9

You can try to convert your xpath expression into CSS selector expression, and then use BeautifulSoup select() method which accept CSS selector expression parameter :

soup = BeautifulSoup("your html source")
result = soup.select("#content > div:nth-of-type(2) > p > a > img")

answered Jun 04 '15 at 07:09

har07

88,338
12
84
137

1

this doesnt work :/ I get TypeError: 'NoneType' object is not callable – Ninjinx Jun 04 '15 at 08:34
1

There's nothing can trigger that exception in this answer. However, `result` can be `None` if the html source doesn't contain element that satisfy the selector.. – har07 Jun 04 '15 at 09:02
To cross check, try to save `soup` object to file and see if the file contains the expected element. You can't cross check by inspecting element in browser as they maybe different. Some element might be generated by javascript (bs can't execute js while your browser surely can) – har07 Jun 04 '15 at 09:03

score 2 · Answer 2 · edited Feb 12 '20 at 06:06

2

Since you are already familiar with xpath, why don't you use lxml parser, you can find elements using xpath directly, here is a function that does just that:

from lxml import html
def find_by_xpath(element_source,xpath_expression):
    root = html.fromstring(element_source)
    return root.xpath(xpath_expression)

edited Feb 12 '20 at 06:06

Gustavo

668
13
24

answered May 23 '17 at 10:07

SEDaradji

1,348
15
18

1

I get 'html' not defined. – Katastic Voyage Dec 16 '19 at 22:58
add this `from lxml import html` – Gustavo Feb 12 '20 at 02:05

Convert XPath to Beautiful Soup

2 Answers2