0

Using xpath "//*" as in the code below will retrieve the whole XML document.

=IMPORTXML("https://api.bol.com/catalog/v4/search/?apikey=myapikey&format=xml&q=isbn","//*")

I haven't been able to just extract specific elements like Title. I've looked at the xpath documentation, tried a dozen or so variations, but I keep getting an empty result. What would the right Xpath be for extracting the Title?

An example XML document can be found here: https://developers.bol.com/wp-content/pagina/handleiding/xml/search.xml

This XML file does not appear to have any style information associated with it. The document tree is shown below.

   <SearchResults xmlns="http://api.bol.com/openapi-4.0.0">
     <OriginalRequest>
       <Category/>
     </OriginalRequest>
     <TotalResultSize>4875</TotalResultSize>
     <Products>
       <Id>9200000026739211</Id>
       <EAN>5051888195819</EAN>
       <GPC>dvd</GPC>
       <Title>Harry Potter - Complete 8-Film Collection</Title>
       <SpecsTag>Warner Home Video</SpecsTag>
       <Summary>Dvd | 8 disks | mei 2014 | null</Summary>
       <Rating>48</Rating>
       <ShortDescription>
Rubén
  • 34,714
  • 9
  • 70
  • 166
dean2020
  • 645
  • 2
  • 8
  • 25
  • 3
    I'm pretty sure that's because you are [not using namespaces](http://stackoverflow.com/a/26036309/11683) in your xpath query. – GSerg Apr 10 '16 at 20:39
  • Thank you Serge. I'm just learning about xpath. I did try using the namespace "http://api.bol.com/openapi-4.0.0" as part of the xpath before, but couldn't get that to work either. I'm probably doing something wrong. would greatly appreciate an example of how to contruct the xpath using the "http://api.bol.com/openapi-4.0.0" namespace and Title element in this case. – dean2020 Apr 10 '16 at 21:12
  • See http://stackoverflow.com/questions/10981312/xml-element-has-namespace-my-xpath-does-not-work – Markus Apr 10 '16 at 21:21
  • Thanks Serge and Markus. – dean2020 Apr 11 '16 at 08:19

1 Answers1

0

You could use this sort of roundabout workaround:

=arrayformula(REGEXREPLACE(filter(index(IMPORTDATA("https://developers.bol.com/wp-content/pagina/handleiding/xml/search.xml"),,1),left(index(IMPORTDATA("https://developers.bol.com/wp-content/pagina/handleiding/xml/search.xml"),,1),7)="<Title>"),"<Title>|</Title>",""))

What I am doing here is first using importdata to pull in the xml. You will notice that it spans more than one column, so in order to ignore the additional data, i wrap it in the INDEX formula so that I can nest it and filter by it.

enter image description here

I then use the filter function and specify that my condition is the row must start with the tag im choosing, in this case it is <Title> by using the Left function.

enter image description here

At this point you would see the only values showing are all title rows - which I then clean up with regexreplace to remove the beginning and end tags.

enter image description here

Aurielle Perlmann
  • 5,323
  • 1
  • 15
  • 26