3

I am new to R and I want to use R to get some data from the website. I tried to get some cities index and cities name from Yahoo API, and thus I need to parse an XML file. but when I tried to get the value of some nodes using getNodeSet() function, R returns an empty list. Could our experts give me some advice about this kind of issue? thanks a lot!

yahoo link: Yahoo weather API

and I've updated the XML file.

<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="346" yahoo:created="2015-07-30T02:48:20Z" yahoo:lang="zh-CN">
<diagnostics>
<publiclyCallable>true</publiclyCallable>
<url execution-start-time="24" execution-stop-time="114" execution-time="90">
<![CDATA[
http://wws.geotech.yahooapis.com/v1/counties/CN;start=0;count=1000
]]>
</url>
<user-time>121</user-time>
<service-time>90</service-time>
<build-version>0.2.154</build-version>
</diagnostics>
<results>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198131">
<woeid>26198131</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Wuwei</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198056">
<woeid>26198056</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Jinchang</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198129">
<woeid>26198129</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Lanzhou</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198130">
<woeid>26198130</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Baiyin</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198128">
<woeid>26198128</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Linxia Huizu</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198133">
<woeid>26198133</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Zhangye</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198127">
<woeid>26198127</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Dingxi</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198125">
<woeid>26198125</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Gannan Zangzu</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198042">
<woeid>26198042</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Ili Kazakh</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198043">
<woeid>26198043</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Kizilsu Kirghiz</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198047">
<woeid>26198047</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Aletai</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198049">
<woeid>26198049</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Hetian</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198262">
<woeid>26198262</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Jiamusi</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198263">
<woeid>26198263</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Shuangyashan</name>
</place>
<place xmlns="http://where.yahooapis.com/v1/schema.rng" xml:lang="en-US" yahoo:uri="http://where.yahooapis.com/v1/place/26198057">
<woeid>26198057</woeid>
<placeTypeName code="9">Prefecture</placeTypeName>
<name>Daxing'anling</name>
</place>
</results>
</query>
<!--  total: 121  -->
<!--  pprd1-node1004-lh1.manhattan.ne1.yahoo.com  -->

I tried by this code :

> library(XML)
> temp = xmlTreeParse("yql.xml",useInternalNodes = TRUE)
> woeid = getNodeSet(temp,"//woeid")
> woeid

But its return is :

> list()
attr(,"class")
[1] "XMLNodeSet"
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Lisen
  • 168
  • 1
  • 2
  • 11

1 Answers1

1

woeid inherits default namespace of parent place element. To reference element in namespace using XPath, you need to map a prefix to point to the corresponding namespace uri first, then use that prefix properly your XPath.

I'm not frequent user, but some online sources guide me to something like this :

getNodeSet(temp, "//d:woeid", c(d="http://where.yahooapis.com/v1/schema.rng"))
har07
  • 88,338
  • 12
  • 84
  • 137
  • The only difference in my code is I put quote on `c("d" = ...)`, but that should not make a difference. I used `xmlTreeParse("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20geo.counties%20where%20place%3D%22CN%22&diagnostics=true", useInternal = TRUE)` on the previous line – Rich Scriven Jul 30 '15 at 06:28
  • @RichardScriven OK, I will test this again.. I will keep you posted on the test. thanks a lot! – Lisen Jul 30 '15 at 06:37
  • hello Richard and @har07 I tried again with the XML file but I didn't get the value of the node :( But if I parse other XML file, the function `getNodeSet()` can work well.. It is a little weird... – Lisen Jul 31 '15 at 05:45
  • @Lisen I don't have `r` in my local machine to test. Is it possible to use `xmlTreeParse()` from string instead of file? (I'm looking for possibility to test the codes in http://www.r-fiddle.org/) – har07 Jul 31 '15 at 06:53
  • @har07 it seems that the function only receive a xml file as a parameter. I also test the xPath in the link: [link](http://www.freeformatter.com/xpath-tester.html). I test `//woeid`, it returns not match. but if I test `//query`(the xml root), it can returns the content... So is the xPath value for node WOEID not correct? – Lisen Jul 31 '15 at 07:54
  • the root element doesn't have default namespace (`xmlns="...."`). Try to ignore namespaces like so : `getNodeSet(temp, "//*[local-name()='woeid']")` – har07 Jul 31 '15 at 07:58
  • @har07 thank you very much!! It returns the value! I'm not quite familiar with the namespace..could you please give me some clue if you're not busy:) – Lisen Jul 31 '15 at 08:39
  • @Lisen For now, these threads may give you some clues [1](http://stackoverflow.com/questions/23211020/complex-xpath-query-with-getnodeset-in-r/23211321#23211321), [2](http://stackoverflow.com/questions/24954792/xpath-and-namespace-specification-for-xml-documents-with-an-explicit-default-nam/24955051#24955051), [3](http://stackoverflow.com/questions/31177707/parsing-xml-to-get-element-value-using-lxml/31178720#31178720) – har07 Jul 31 '15 at 08:57