1

I trying to use Xpath to get the DataTable Headers.

My output should be:

ItemNum|Item|ResultCode|Status|ExtBackLinks|RefDomains|AnalysisResUnitsCost|ACRank|ItemType|IndexedURLs|GetTopBackLinksAnalysisResUnitsCost|DownloadBacklinksAnalysisResUnitsCost|DownloadRefDomainBacklinksAnalysisResUnitsCost|RefIPs|RefSubNets|RefDomainsEDU|ExtBackLinksEDU|RefDomainsGOV|ExtBackLinksGOV|RefDomainsEDU_Exact|ExtBackLinksEDU_Exact|RefDomainsGOV_Exact|ExtBackLinksGOV_Exact|CrawledFlag|LastCrawlDate|LastCrawlResult|RedirectFlag|FinalRedirectResult|OutDomainsExternal|OutLinksExternal|OutLinksInternal|OutLinksPages|LastSeen|Title|RedirectTo|Language|LanguageDesc|LanguageConfidence|LanguagePageRatios|LanguageTotalPages|RefLanguage|RefLanguageDesc|RefLanguageConfidence|RefLanguagePageRatios|RefLanguageTotalPages|CrawledURLs|RootDomainIPAddress|TotalNonUniqueLinks|NonUniqueLinkTypeHomepages|NonUniqueLinkTypeIndirect|NonUniqueLinkTypeDeleted|NonUniqueLinkTypeNoFollow|NonUniqueLinkTypeProtocolHTTPS|NonUniqueLinkTypeFrame|NonUniqueLinkTypeImageLink|NonUniqueLinkTypeRedirect|NonUniqueLinkTypeTextLink|RefDomainTypeLive|RefDomainTypeFollow|RefDomainTypeHomepageLink|RefDomainTypeDirect|RefDomainTypeProtocolHTTPS|CitationFlow|TrustFlow|TrustMetric|TopicalTrustFlow_Topic_0|TopicalTrustFlow_Value_0|TopicalTrustFlow_Topic_1|TopicalTrustFlow_Value_1|TopicalTrustFlow_Topic_2|TopicalTrustFlow_Value_2

This is the original XML:

<Result Code="OK" ErrorMessage="" FullError="">
<GlobalVars FirstBackLinkDate="2012-09-21" IndexBuildDate="2018-05-24 19:47:18" IndexType="0" MostRecentBackLinkDate="2018-04-23" QueriedRootDomains="1" QueriedSubDomains="0" QueriedURLs="0" QueriedURLsMayExist="0" ServerBuild="2018-06-11 13:52:01" ServerName="BRUNO28" ServerVersion="1.0.6736.23160" UniqueIndexID="20180524194718-HISTORICAL"/>
<DataTables Count="1">
<DataTable Name="Results" RowsCount="1" Headers="ItemNum|Item|ResultCode|Status|ExtBackLinks|RefDomains|AnalysisResUnitsCost|ACRank|ItemType|IndexedURLs|GetTopBackLinksAnalysisResUnitsCost|DownloadBacklinksAnalysisResUnitsCost|DownloadRefDomainBacklinksAnalysisResUnitsCost|RefIPs|RefSubNets|RefDomainsEDU|ExtBackLinksEDU|RefDomainsGOV|ExtBackLinksGOV|RefDomainsEDU_Exact|ExtBackLinksEDU_Exact|RefDomainsGOV_Exact|ExtBackLinksGOV_Exact|CrawledFlag|LastCrawlDate|LastCrawlResult|RedirectFlag|FinalRedirectResult|OutDomainsExternal|OutLinksExternal|OutLinksInternal|OutLinksPages|LastSeen|Title|RedirectTo|Language|LanguageDesc|LanguageConfidence|LanguagePageRatios|LanguageTotalPages|RefLanguage|RefLanguageDesc|RefLanguageConfidence|RefLanguagePageRatios|RefLanguageTotalPages|CrawledURLs|RootDomainIPAddress|TotalNonUniqueLinks|NonUniqueLinkTypeHomepages|NonUniqueLinkTypeIndirect|NonUniqueLinkTypeDeleted|NonUniqueLinkTypeNoFollow|NonUniqueLinkTypeProtocolHTTPS|NonUniqueLinkTypeFrame|NonUniqueLinkTypeImageLink|NonUniqueLinkTypeRedirect|NonUniqueLinkTypeTextLink|RefDomainTypeLive|RefDomainTypeFollow|RefDomainTypeHomepageLink|RefDomainTypeDirect|RefDomainTypeProtocolHTTPS|CitationFlow|TrustFlow|TrustMetric|TopicalTrustFlow_Topic_0|TopicalTrustFlow_Value_0|TopicalTrustFlow_Topic_1|TopicalTrustFlow_Value_1|TopicalTrustFlow_Topic_2|TopicalTrustFlow_Value_2" MaxTopicsRootDomain="30" MaxTopicsSubDomain="20" MaxTopicsURL="10" TopicsCount="3">
<Row>
0|nu.nl|OK|Found|508322106|165344|508322106|-1|1|4149991|5000|512472097|3356880|59147|26204|233|3613|43|308|73|1757|4|12|False| | |True| |5|10|44|1722150| |NU - Het laatste nieuws het eerst op NU.nl|https://www.nu.nl/|nl|Dutch/Flemish|92|99.9|482980|nl,en,de|Dutch/Flemish,English,German|87,93,58|96.5,3.1,0.1|76319583|1915923|52.85.201.19|611833777|15034990|53120677|444371798|95283418|52384870|388104|53497551|5655999|552292123|102171|115787|21952|150164|49554|76|70|70|News/Breaking News|69|Sports/Resources|45|Arts/Radio|43
</Row>
</DataTable>
</DataTables>
</Result>

When I use this Xpath command in Google Sheets:

=importxml("http://enterprise.majesticseo.com/api_command?privatekey=xxx&accessToken=xxx&cmd=GetIndexItemInfo&item0=nu.nl&items=1","//DataTable"

I get the Row results. Which Is great but I also need the Header names in the first row of my sheet.

Rubén
  • 34,714
  • 9
  • 70
  • 166
Thijs
  • 25
  • 2

1 Answers1

4

a short introduction to XPath :-)

With //DataTable you will get the full node of any <DataTable> (no namespaces involved here) anywhere in the XML.
It is - as a rule of thumb - better to be as specific as possible (rather use /Result/DataTables/DataTable). But this is not the answer to your question...

Just imagine an XML like this:

<root>
  <innerNode attr="1"><a>Some a content</a><b>Some b content</b></innerNode>
  <innerNode attr="2"><a>aaa</a><b>bbb</b></innerNode>
</root>

With /root/innerNode you'd get both <innerNode> with all content.

With /root/innerNode[(b/text())[1]="bbb"] you'd get just one <innerNode>, the one where <b>'s text() is "bbb"

With /root/innerNode[@attr="1"] you'd get the one <innerNode> where the attribute attr has a value of "2".

All three XPath samples bring back the full node, including sub nodes, attributes, whatever.

if you want just the value of an attribute you must ask for it:

(/root/innerNode/@attr)[2] 

... returns the attribute's value of the second <innerNode> (the 2nd occurance actually)

/root/innerNode[(b/text())[1]="Some b content"]/@attr

... returns the attribute's value of the <innerNode> where the <b> has a text() 0f "Some b content"

back to your question

You want to read the attribute Headers within the element <DataTable> living at /Result/DataTables. Just use

/Result/DataTables/DataTable/@Headers
Community
  • 1
  • 1
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Figured I'd ask the XML Ninja a question. Regarding https://stackoverflow.com/questions/51030750/sql-server-how-to-sort-in-order-as-entered/51030779#51030779 Is there a risk of XML Parse approach changing the sequence? – John Cappelletti Jun 26 '18 at 16:40
  • @JohnCappelletti commented there :-) – Shnugo Jun 26 '18 at 17:00
  • Wow, thanks for your help and explanation. That was very usefull and it worked! – Thijs Jun 27 '18 at 10:45