0

I'm facing this issue, I could not read an .xml file to make it as a data.frame in R. I know that this question have already great answers here and here, but I'm not able to decline the answers to my necessity, so sorry if it's duplicate.

I have a .xml like this:

<?xml version='1.0' encoding='UTF-8'?>
<LexicalResource>
  <GlobalInformation label="Created with the standard propagation algorithm"/>
  <Lexicon languageCoding="UTF-8" label="sentiment" language="-">
    <LexicalEntry id="id_0" partOfSpeech="adj">
      <Lemma writtenForm="word"/>
      <Sense>
        <Confidence score="0.333333333333" method="automatic"/>
        <Sentiment polarity="negative"/>
        <Domain/>
      </Sense>
    </LexicalEntry>
        </Lexicon>
</LexicalResource>

Stored locally. So i tried this way:

library(XML)
    doc<-xmlParse("...\\test2.xml")
    xmldf <- xmlToDataFrame(nodes=getNodeSet(doc,"//LexicalEntry/Lemma/Sense/Confidence/Sentiment"))

but the result is this:

> xmldf
data frame with 0 columns and 0 rows

So I tried the xml2 package:

library(xml2)
pg <- read_xml("...test2.xml")

recs <- xml_find_all(pg, "LexicalEntry")

> recs
{xml_nodeset (0)}

I have a lack of knowledge in manipulating .xml files, so I think I'm missing the point. What am I doing wrong?

s__
  • 9,270
  • 3
  • 27
  • 45

1 Answers1

0

You need the attributes, not the values, that's why the methods you have used do not work, try something like this:

data.frame(as.list(xpathApply(doc, "//Lemma", fun = xmlAttrs)[[1]]), 
           as.list(xpathApply(doc, "//Confidence", fun = xmlAttrs)[[1]]), 
           as.list(xpathApply(doc, "//Sentiment", fun = xmlAttrs)[[1]]))

  writtenForm          score    method polarity
1        word 0.333333333333 automatic negative

Another option is to get all the attributes of the xml and build with them a data.frame:

df <- data.frame(as.list(unlist(xmlToList(doc, addAttributes = TRUE, simplify = TRUE))))
colnames(df) <- unlist(lapply(strsplit(colnames(df), "\\."), function(x) x[length(x)]))
df
                                            label writtenForm          score    method 
1 Created with the standard propagation algorithm        word 0.333333333333 automatic 
  polarity   id partOfSpeech languageCoding     label language
1 negative id_0          adj          UTF-8 sentiment        -
  • Thanks a lot. In your opinion, if I use the first option for a larger .xml and (a bigger version of the one posted) and I get this error `Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0`, what should it means? – s__ Jul 12 '18 at 13:41
  • 1
    I can not be sure without having more details, maybe you can give a slightly bigger example where this problem can be reproduced. Probably the data.frame is getting different number of rows for Lemma, Confidence and Sentiment. you can execute the calls by chunks (example: xpathApply (doc, "// Lemma", fun = xmlAttrs)) and see how you can build your data.frame so that you do not get these errors. – Juan Antonio Roldán Díaz Jul 13 '18 at 06:29