4

Trying to grab the URLs or URL snippets of images from a webpage using Google Sheets IMPORTXML function. I'm fairly sure I have the Xpath right, but I either get nothing or a "that data can't be parsed" - and yet I've seen other examples here of people grabbing URLs in Google Sheets this way, though in slightly different circumstances, none of which seem to be applicable here.

Trying to grab from this page - this is the relevant HTML:

<div id="product_image" class="A_ProductImg">
        <div class="bx-wrapper" style="max-width: 100%;"><div class="bx-viewport" style="width: 100%; overflow: hidden; position: relative; height: 540px;"><ul class="A_ProductImgSlider" style="width: auto; position: relative;">
         <li class="A_ItemList" style="float: none; list-style: none; position: absolute; width: 540px; z-index: 50; display: block;">
          <div class="image A_ItemImg A_SquareOuter">
           <img src="/ec/img/D3-64I011012_M_s.jpg" onerror="this.src='/ec/images/common/NoImage.gif'" alt="main product image" id="mainImage" class="A_ItemProductImg A_Square">
          </div>
         </li>
         <li class="A_ItemList" style="float: none; list-style: none; position: absolute; width: 540px; z-index: 0; display: none;"><div class="image A_ItemImg A_SquareOuter"><img src="/ec/img/D3-64I011012_S_1m.jpg" alt="product image 1" class="A_ItemProductImg A_Square"></div></li>
         <li class="A_ItemList" style="float: none; list-style: none; position: absolute; width: 540px; z-index: 0; display: none;"><div class="image A_ItemImg A_SquareOuter"><img src="/ec/img/D3-64I011012_S_2m.jpg" alt="product image 2" class="A_ItemProductImg 

I've made the following query to try and work with the subsequent HTML:

=IMPORTXML(A2,"//*[@id='product_image']/div[1]/div[1]/ul/li[1]/div/img src")

A2 having the relevant URL.

I think the Xpath is correct there, but not sure why it won't give me the result I'm looking for. I've played around with it a bit, but no luck.

Rubén
  • 34,714
  • 9
  • 70
  • 166
thecraigsea
  • 98
  • 1
  • 7

1 Answers1

5

How about this answer? Please think of this as just one of several answers.

Sample formulas:

In this case, https://www.mikigakki.com/ec/pro/disp/H/D3-64I011012?sFlg=2 is put in the cell "A1".

Pattern 1:

=IMPORTXML(A1,"//img/@src")

enter image description here

Pattern 2:

=IMPORTXML(A1,"//li//@src")

enter image description here

Pattern 3:

=IMPORTXML(A1,"//li[position()>1]//@src")

enter image description here

Pattern 4:

=ARRAYFORMULA("https://www.mikigakki.com"&IMPORTXML(A1,"//li[position()>1]//@src"))

enter image description here

Pattern 5:

=ARRAYFORMULA("https://www.mikigakki.com"&IMPORTXML(A1,"//li[1]//@src"))

enter image description here

Note:

  • If you want to retrieve the 1st image url using *[@id='product_image'], you can also use the following formula.

    =IMPORTXML(A1,"//*[@id='product_image']/ul/li[1]/div/img/@src")
    

Reference:

If I misunderstood your question and this was not the result you want, I apologize.

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • Those solutions work very well. Thank you! よく助かりました!本当にありがとうございました! – thecraigsea Oct 12 '19 at 09:08
  • @thecraigsea Thank you for replying. I'm glad your issue was resolved. 無事解決できたとのことで安心しました。 – Tanaike Oct 12 '19 at 23:06
  • 1
    The link no longer works. I just tried with a ramdomly chosen link (`=IMPORTXML("https://www.mikigakki.com/ec/pro/disp/A/20211029g200ec?sFlg=2","//img/@src")`), the formula still works – Rubén Jan 05 '23 at 17:39
  • @Rubén Thank you for checking it. – Tanaike Jan 05 '23 at 23:48