I am scraping web data and need to return just the text element associated with a hyperlink. The hyperlink and text are unknown. The class is known. Here is example HTML:
<div class="a-column SsCol" role = "gridcell">
<h3 class="a-spacing-none SsName">
<span class="a-size-medium a-text-bold">
<a href="/gp/aag/main/ref=sm_name_2?ie=UTF8&ids=15112acd">Direct Name</a>
</span>
</h3>
</div>
Alternatively, the desired text may be associated with an image instead of a hyperlink:
<div class="a-column SsCol" role = "gridcell">
<h3 class="a-spacing-none SsName">
<img alt="Direct Name" src="https://images-hosted.com//01x-j.gi">
</h3>
</div>
I have tried the method below:
from lxml import html
import requests
response = requests.get('https://www.exampleurl.com/')
doc = html.fromstring(response.content)
text1 = doc.xpath("//*[contains(@class, 'SsName')]/text()")
I am using lxml instead of BeautifulSoup, but am willing to switch if it is recommended. The desired result is:
print(text1)
['Direct Name']