I'm trying to read this file in R. I tried to use XML package
, but I have no idea what is in the data set and I haven't used the package before.
I'd appreciate any help from you guys.
Thanks.
Sebastián.
I'm trying to read this file in R. I tried to use XML package
, but I have no idea what is in the data set and I haven't used the package before.
I'd appreciate any help from you guys.
Thanks.
Sebastián.
There's no way around it - you need to understand XML and XPath to use it in R. Assuming you do, view the document in a browser to get an idea of its structure. Then, this should get you started using the XML package.
library(XML)
xml <- xmlParse("http://data.mcc.gov/raw/xml/MCC_HN.xml")
org <- xpathApply(xml,"//iati-activity/reporting-org",xmlValue)
id <- xpathApply(xml,"//iati-activity/iati-identifier",xmlValue)
title <- xpathApply(xml,"//iati-activity/title",xmlValue)
desc.1 <- xpathApply(xml,"//iati-activity/description[@type='1']",xmlValue)
desc.2 <- xpathApply(xml,"//iati-activity/description[@type='2']",xmlValue)
status <- xpathApply(xml,"//iati-activity/activity-status",xmlValue)
start.planned <- xpathApply(xml,"//iati-activity/activity-date[@type='start-planned']",xmlValue)
start.actual <- xpathApply(xml,"//iati-activity/activity-date[@type='start-actual']",xmlValue)
end.planned <- xpathApply(xml,"//iati-activity/activity-date[@type='end-planned']",xmlValue)
end.actual <- xpathApply(xml,"//iati-activity/activity-date[@type='end-actual']",xmlValue)
df <- data.frame(cbind(org,id, title, status,
start.planned, start.actual, end.planned, end.actual,
desc.1, desc.2))
Read the documentation on the functions I've used above, e.g. xmlParse(...)
, xpathApply(...)
, and xmlValue(...)
to figure out what the code is doing.
One note: there is a function xmlToDataFrame(...)
in the XML package. The problem with your document is that you have multiple elements with the same tag name (examples: description
and activity-date
), which are disambiguated using the type=
attribute. xmlToDataFrame(...)
doesn't know how to deal with that, so you need to do it the hard way...
It's not really clear what you want to do with the data, but here we get it
xml = xmlParse("http://data.mcc.gov/raw/xml/MCC_HN.xml")
Then query the result for all "transaction" records and make them into a data frame
df <- xmlToDataFrame(xml["//transaction"])
with
> dim(df)
[1] 730 11
> head(df, 2)
aid-type
1
2
description
1 Commitment: Honduras-614G Fund-Not Applicable-Not Applicable-2011-04-01
2 Disbursement: Honduras-614G Fund-Not Applicable-Not Applicable-2011-04-01
disbursement-channel finance-code flow-type provider-org
1 Millennium Challenge Corporation
2 Millennium Challenge Corporation
receiver-org tied-status transaction-date transaction-type value
1 Honduras 2011-04-01 COMMITMENT 274380.75
2 Honduras 2011-04-01 DISBURSEMENT 0.00
Maybe you'd like to extract the attribute associated with 'aid-type' and add it to the data frame; use XPath to do so
df$`aid-type-code` <- as.character(xml["//aid-type/@code"])