xpath troubles - no results

Question

Using xpath "//*" as in the code below will retrieve the whole XML document.

=IMPORTXML("https://api.bol.com/catalog/v4/search/?apikey=myapikey&format=xml&q=isbn","//*")

I haven't been able to just extract specific elements like Title. I've looked at the xpath documentation, tried a dozen or so variations, but I keep getting an empty result. What would the right Xpath be for extracting the Title?

An example XML document can be found here: https://developers.bol.com/wp-content/pagina/handleiding/xml/search.xml

This XML file does not appear to have any style information associated with it. The document tree is shown below.

   <SearchResults xmlns="http://api.bol.com/openapi-4.0.0">
     <OriginalRequest>
       <Category/>
     </OriginalRequest>
     <TotalResultSize>4875</TotalResultSize>
     <Products>
       <Id>9200000026739211</Id>
       <EAN>5051888195819</EAN>
       <GPC>dvd</GPC>
       <Title>Harry Potter - Complete 8-Film Collection</Title>
       <SpecsTag>Warner Home Video</SpecsTag>
       <Summary>Dvd | 8 disks | mei 2014 | null</Summary>
       <Rating>48</Rating>
       <ShortDescription>

I'm pretty sure that's because you are [not using namespaces](http://stackoverflow.com/a/26036309/11683) in your xpath query. — GSerg, Apr 10 '16 at 20:39
Thank you Serge. I'm just learning about xpath. I did try using the namespace "http://api.bol.com/openapi-4.0.0" as part of the xpath before, but couldn't get that to work either. I'm probably doing something wrong. would greatly appreciate an example of how to contruct the xpath using the "http://api.bol.com/openapi-4.0.0" namespace and Title element in this case. — dean2020, Apr 10 '16 at 21:12
See http://stackoverflow.com/questions/10981312/xml-element-has-namespace-my-xpath-does-not-work — Markus, Apr 10 '16 at 21:21

score 0 · Answer 1 · answered May 29 '16 at 01:08

You could use this sort of roundabout workaround:

=arrayformula(REGEXREPLACE(filter(index(IMPORTDATA("https://developers.bol.com/wp-content/pagina/handleiding/xml/search.xml"),,1),left(index(IMPORTDATA("https://developers.bol.com/wp-content/pagina/handleiding/xml/search.xml"),,1),7)="<Title>"),"<Title>|</Title>",""))

What I am doing here is first using importdata to pull in the xml. You will notice that it spans more than one column, so in order to ignore the additional data, i wrap it in the INDEX formula so that I can nest it and filter by it.

I then use the filter function and specify that my condition is the row must start with the tag im choosing, in this case it is <Title> by using the Left function.

At this point you would see the only values showing are all title rows - which I then clean up with regexreplace to remove the beginning and end tags.

xpath troubles - no results

1 Answers1