0

I am using Google Sheet to read XML data. Unfortunately I am struggling to extract the XML correctly. https://www.w3schools.com/xml/xpath_syntax.asp, SO and other Googled sources didn't help.

While processing the simplified data below

<item>
    <profile_url>www.profile1.com</profile_url>
    <answers>
        <answers_item>
            <answer>answer text 1.1</answer>
        </answers_item>
        <answers_item>
            <answer>answer text 1.2</answer>
        </answers_item>
    </answers>
<item>
<item>
    <profile_url>www.profile2.com</profile_url>
    <answers>
        <answers_item>
            <answer>answer text 2.1</answer>
        </answers_item>
        <answers_item>
            <answer>answer text 2.2</answer>
        </answers_item>
    </answers>
<item>

The output needs to be in the following format so additional formatting isn't required in a spreadsheet

www.profile1.com   answer text 1.1   answer text 1.2
www.profile2.com   answer text 2.1   answer text 2.2

Variants tried and failed to give the correct output are

"//profile_url //answers/answers_item/answer"
"//profile_url | //answers/answers_item/answer"

Thanks in advance Rob


Tried fixes from suggested duplicate

1) Using string-join

string-join(//item/(concat(profile_url/text(), '.', answers/answers_item/answer//text())), "&#10;")

Gave the follow error (maybe because XPath used is not 2.0)

Imported Xml content can not be parsed.

2) Using

concat(//profile_url/text(), " ", //answers/answers_item/answer/text()) 

Gave the first entry only.

Community
  • 1
  • 1

1 Answers1

0

Assuming you have only two answers_items in each item, the following xpath

concat(//item[1]/profile_url, " ", //item[1]/answers/answers_item[1]/answer, " " , //item[1]/answers/answers_item[2]/answer)

should give the required output as :

www.profile1.com answer text 1.1 answer text 1.2 for the first item

For the second one, you need to modify item[1] to item[2]

This can be easily done if you are working in Java.

SomeDude
  • 13,876
  • 5
  • 21
  • 44
  • Thanks, but there are 6,000 records to process and the output goes to a spreadsheet - so no Java present.So the iteration needs to happen within the XPath statement. – Robert Sharpe Feb 15 '17 at 18:15
  • When you say "no java" do you mean this is all taking place on the web page itself? If so, sounds like you need to use Javascript or VBScript to accomplish what you want here. What is it using to "get the data to the spreadsheet" now? – Bill Hileman Feb 15 '17 at 19:34
  • @RobertSharpe `iteration needs to happen within the XPath statement.` ? It is not possible for your requirement. You cannot use `concat` to print all nodes. You can get the nodes themselves but not the text in them concatenated in the format you want. BTW how are you evaluating your XPath expressions ? Which engine? – SomeDude Feb 15 '17 at 19:59
  • @Bil-Hilerman As the first sentence says "I am using Google Sheet to read XML data." It is happening in a Google Sheet using the IMPORTXML function (see https://support.google.com/docs/answer/3093342?hl=en) that takes an XPath parameter to specify the fields from the read XML data. – Robert Sharpe Feb 16 '17 at 22:13
  • @svasa cancat was just an example someone else suggested. The XPath expressions are being evaluated by the IMPORTXML function. I am sure that a powerful language like XPath can format more than 1 parameter in an easy to import way - it is just knowing the correct command. – Robert Sharpe Feb 16 '17 at 22:18