0

I am trying to convert xml content to DataFrame. The xml is as follows:

<group>
    <data>
        <metadata>
            <meta content="6 cyl" name="engine"/>
            <meta content="55" name="mpg"/>
            <meta content="2700" name="weight"/>
        </metadata>
    </data>
    <data>
        <metadata>
            <meta content="3 cyl" name="engine"/>
            <meta content="65" name="mpg"/>
            <meta content="2420" name="weight"/>
        </metadata>
    </data>
</group>

and I want the DataFrame as follows:

engine   mpg   weight
6 cyl    55    2700
3 cyl    65    2400

I tried this:

data <- read_xml("myFile.xml")
meta <- data %>% xml_find_all("//meta")
df <- data.frame(name = sapply(meta %>% xml_attr("name"), as.character),
                  content = sapply(meta %>% xml_attr("content"), as.character))

But it produces this DataFrame:

name      content
engine    6 cyl
mpg       55
weight    2700
engine    3 cyl
mpg       65

weight 2420

then...

df <- df %>% spread(unique(name), content)

Produces an error:

Error: Duplicate identifiers for rows....

Is my approach correct, or there is another way to achieve this?

Paradox
  • 4,602
  • 12
  • 44
  • 88

2 Answers2

0

Spread requires each row to have a unique identifier. There's some good discussion here https://community.rstudio.com/t/spread-why-errors/2076/3

This should give you what you want:

df %>% group_by(name) %>% mutate(id = row_number()) %>% 
spread(name, content) %>% select(-id)
JeffR
  • 524
  • 3
  • 10
0

XML to Data Frame To handle the data effectively in large files we read the data in the xml file as a data frame. Then process the data frame for data analysis.

# Load the packages required to read XML files.
library("XML")
library("methods")

# Convert the input xml file to a data frame.
xmldataframe <- xmlToDataFrame("input.xml")
print(xmldataframe)

When we execute the above code, it produces the following result −

engine   mpg   weight
6 cyl    55    2700
3 cyl    65    2400

As the data is now available as a dataframe we can use data frame related function to read and manipulate the file.

mrk
  • 8,059
  • 3
  • 56
  • 78