0

The html code is looking like this:

<img alt="Papa&#39;s Cupcakeria To Go!" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-old-hires=""  class="a-dynamic-image  a-stretch-vertical" id="landingImage" data-a-dynamic-image="{&quot;https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L.png&quot;:[512,512],&quot;https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SX425_.png&quot;:[425,425],&quot;https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SX466_.png&quot;:[466,466],&quot;https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SY450_.png&quot;:[450,450],&quot;https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L._SY355_.png&quot;:[355,355]}" style="max-width:512px;max-height:512px;">

I want to get "https://images-na.ssl-images-amazon.com/images/I/814vdYZK17L.png" and now I'm using

extract_item(hxs.xpath("//img[@id='landingImage']/@data-a-dynamic-image"))

, what I got is all the content inside that tag. How can I get the first url only?

FakeYG
  • 1
  • 1

1 Answers1

0

If you just want the first URL:

full_content = extract_item(hxs.xpath("//img[@id='landingImage']/@data-a-dynamic-image"))
list_contents = full_content.split(";")
first_image = list_contents[1].replace("&quot","")
print first_image

Also, you can refer this for extracting URL using regex.

Shivam Mishra
  • 1,731
  • 2
  • 11
  • 29