1

So I know this topic has been discussed extensively on here. I've found quite a few questions on the same thing but still can't figure out how to parse this XML file. I'm using R and I want to pull the longitude and latitude from the file.

I'm using this data and this guide but can't seem to make it work.

Here's what I do:

require(XML)  
data <- xmlParse("http://www.donatingplasma.org/index.php?option=com_storelocator&format=feed&searchall=1&Itemid=166&catid=-1&tagid=-1&featstate=0")
xml_data <- xmlToList(data)

That all works fine. The XML file is now a "large list." When I try to extract the latitude and longitude, I'm lost. I tried:

location <- as.list(xml_data[["marker"]][["lat"]])

And got a list with 1 row.

How would I go about pulling the latitude and longitude from this XML data?

Sample of data structure:

<markers>
<limited>0</limited>
<marker>
<name>ADMA BioCenters</name>
<category>IQPP Certified</category>
<markertype>
/media/com_storelocator/markers/100713214004000000jl_marker2.png
</markertype>
<featured>false</featured>
<address>
6290 Jimmy Carter Boulevard, Suite 208, Norcross, Georgia 30071
</address>
<lat>33.9290629</lat>
<lng>-84.2204952</lng>
<distance>0</distance>
<fulladdress>
<![CDATA[
<p><img style="margin-left: auto; margin-right: auto;" src="images/jl_marker2.png" alt="jl marker2" width="22" height="22" />IQPP Certified</p>
]]>
</fulladdress>
<phone>678-495-5800</phone>
<url>http://www.atlantaplasma.com</url>
<email/>
<facebook/>
<twitter/>
<tags>
<![CDATA[ ]]>
</tags>
<custom1 name="Custom Field 1">
<![CDATA[ ]]>
</custom1>
<custom2 name="Custom Field 2">
<![CDATA[ ]]>
</custom2>
<custom3 name="Custom Field 3">
<![CDATA[ ]]>
</custom3>
<custom4 name="Custom Field 4">
<![CDATA[ ]]>
</custom4>
<custom5 name="Custom Field 5">
<![CDATA[ ]]>
</custom5>
Community
  • 1
  • 1
Ross Wardrup
  • 311
  • 1
  • 9
  • 26

1 Answers1

2

Use xpathSapply on the original XML rather than going through the list.

lat <- xpathSApply(data, '//marker/lat', xmlValue)
long <- xpathSApply(data, '//marker/lng', xmlValue)

Result:

> head(cbind(lat, long))
     lat          long         
[1,] "33.9290629" "-84.2204952"
[2,] "48.3097292" "14.299297"  
[3,] "41.6134569" "-87.514584" 
[4,] "41.5878273" "-87.3369907"
[5,] "39.98504"   "-83.004705" 
[6,] "43.2056277" "-86.2708023"

Based on @Martin Morgan's comment, I thought it would be good to benchmark different strategies here:

> microbenchmark(xpathSApply(data, '//marker/lat', xmlValue),
                 sapply(data["//marker/lat"], xmlValue),
                 sapply(data["//marker/lat"], as, "numeric"))
Unit: milliseconds
                                        expr       min        lq   median       uq      max neval
 xpathSApply(data, "//marker/lat", xmlValue)  67.03714  97.57796 100.1633 102.1815 213.3031   100
      sapply(data["//marker/lat"], xmlValue)  72.73847 103.63095 106.1037 108.2251 132.6314   100
 sapply(data["//marker/lat"], as, "numeric") 257.16364 346.13708 389.3025 394.3669 598.3736   100

Seems like

Clearly, the last strategy is least efficient (which makes sense because it's invoking type conversion on each node. But that makes it not a completely fair test since the last expression yields numeric output while the first two yield character output. Thus a second tests:

> microbenchmark(as.numeric(xpathSApply(data, '//marker/lat', xmlValue)), 
                 as.numeric(sapply(data["//marker/lat"], xmlValue)), 
                 sapply(data["//marker/lat"], as, "numeric"))
Unit: milliseconds
                                                    expr       min        lq    median       uq      max neval
 as.numeric(xpathSApply(data, "//marker/lat", xmlValue))  60.29744  80.08186  97.94924 100.9548 189.0797   100
      as.numeric(sapply(data["//marker/lat"], xmlValue))  59.45891  85.47169 103.68015 106.5882 124.5708   100
             sapply(data["//marker/lat"], as, "numeric") 210.92816 339.54831 384.28481 392.0001 481.4498   100

Again, using either xpathSApply or sapply (with an xpath extraction) yield really similar results. So a modified version of Martin's first solution:

lat <- as.numeric(sapply(data["//marker/lat"], xmlValue))

may be the best strategy here.

Thomas
  • 43,637
  • 12
  • 109
  • 140
  • 1
    @RossWardrup XML in R is a real pain unless you work with xpath, when it suddenly - possibly magically - becomes much much easier. – Thomas Feb 19 '14 at 21:26
  • 1
    Maybe a little more succinctly, using the notion that the xpath can subset the data object `sapply(data["//marker/lat"], xmlValue)` or using the coerction form as(x, "numeric") `sapply(data["//marker/lat"], as, "numeric")` – Martin Morgan Feb 19 '14 at 21:28