I am trying to create a dataframe containing the values for a few fields extracted from an xml. I am new to xml files and have no idea what I'm doing.
I have tried follow the instructions posted here How to parse XML to R data frame but I haven't been able to get it to work. The xml is formatted like this with many other fields (not shown) within GENESET
<?xml version="1.0" encoding="ISO-8859-1"?>
-< MSIGDB BUILD_DATE="Jul 12, 2018" VERSION="6.2" NAME="msigdb">
< GENESET VALIDATION_DATASETS="" CATEGORY_CODE="C3" EXACT_SOURCE="GOID: 00098" STANDARD_NAME="AAANWWTGC_UNKNOWN"/>
Ideally, I would like each column of the dataframe to just be a list of values for each field within GENESET (i.e. column 1= CATEGORY_CODE; column 2= EXACT_SOURCE). I'd also like the dataframe to have N/A if the field is blank for a specific GENESET.
I have tired this:
require(XML)
doc <- xmlParse("msigdb_v6.2.xml")
exactSource <- as.list(xml_data[["MSIGDB"]][["GENESET"]][["EXACT_SOURCE"]])
but the output of head(exactSource) is
list()
Please help