5

I have a XML file with nodes having the same name and I can't find out how to select the data I want in order to create a dataframe. I'm new to R and XML and I can't seem to make it work!

My XML looks like this (only the beginning):

<GL_Document xmlns="">
    <time_Period.timeInterval>...</time_Period.timeInterval>
    <TimeSeries>
        <mRID>1</mRID>
        <MktPSRType>
            <psrType>ProdType1</psrType>
        </MktPSRType>
        <Period>
            <timeInterval>
                <start>2015-12</start>
                <end>2016-11</end>
            </timeInterval>
            <resolution>PT60M</resolution>
            <Point>
                <position>1</position>
                <quantity>9</quantity>
            </Point>
            <Point>
                <position>2</position>
                <quantity>7</quantity>
            </Point>
            <Point>
                <position>3</position>
                <quantity>9</quantity>
            </Point>

The code contains different nodes named "TimeSeries" with the same type of data (here Period is not developed in order to see the whole structure)

<GL_Document xmlns="">
    <time_Period.timeInterval>...</time_Period.timeInterval>
    <TimeSeries>
        <mRID>1</mRID>
        <MktPSRType>
            <psrType>ProdType1</psrType>
        </MktPSRType>
        <Period>...</Period>
    </TimeSeries>
    <TimeSeries>
        <mRID>2</mRID>
        <MktPSRType>
            <psrType>ProdType2</psrType>
        </MktPSRType>
        <Period>...</Period>
    </TimeSeries>
    <TimeSeries>...</TimeSeries>
    <TimeSeries>...</TimeSeries>
    <TimeSeries>...</TimeSeries>
    <TimeSeries>...</TimeSeries>
    <TimeSeries>...</TimeSeries>
</GL_Document>

I want to get the data in the following format:

psrType    position   quantity  
ProdType1   1         9
...        ...        ...
ProdType2   1         ...

I have tried to apply the solutions as mentioned in this post: How to parse XML to R data frame But unsuccessfully. My code looks like this, but I the resulting dataframe contains no data:

xml.url3<-getURL(xml.file3)
doc<-xmlParse(xml.url3)

position_path<-"//GL_Document/TimeSeries/Period/Point/position"
quantity_path<-"//GL_Document/TimerSeries/Period/Point/quantity"

df<-data.frame(
  pos=as.integer(sapply(doc[position_path],as,"integer")),
  quant=as.integer(sapply(doc[quantity_path],as,"integer")))

Thank you very much for your help!

Community
  • 1
  • 1
Daldal
  • 73
  • 5
  • Ok, so finally this is what I did: ` xml.fileexp<-"https://website.com" xml.urlexp<-getURL(xml.fileexp) doc<-xmlParse(xml.urlexp) df <- ldply(xmlToList(doc), data.frame) dfexp<-df[df$.id=="TimeSeries",]` – Daldal Nov 21 '16 at 14:48
  • Ok sorry, I have to comment twice.. not time left for editing! Then I had a dataframe with every thing inside my timeseries nodes listed (in rows). As starting and stoping time were given, I could write a code taking into account the number of positions in the interval.. Probably not the most clever solutionm but it works! – Daldal Nov 21 '16 at 14:56

2 Answers2

2

Consider using XML's xpathSApply() to run direct XPath expressions. For the psrType value, use ancestor::* which data.frame() maps the one value per <TimeSeries> to each corresponding values of period and quantity:

library(XML)

xml.url3 <- getURL(xml.file3)
doc <- xmlParse(xml.url3)

period <- xpathSApply(doc, "//Point/position", xmlValue)
quantity <- xpathSApply(doc, "//Point/quantity", xmlValue)
psrType <- xpathSApply(doc, "//Point/ancestor::TimeSeries/MktPSRType/psrType", xmlValue)

df <- data.frame(psrType = psrType,
                 period = as.integer(period),
                 quantity = as.integer(quantity))

df
#     psrType period quantity
# 1 ProdType2      1        9
# 2 ProdType2      2        7
# 3 ProdType2      3        9
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

I think you might find what you need here: How to transform XML data into a data.frame?

This xmlToList() looks about right. You can then get it into a data framee by using one of the apply methods. There is a grea explanation at the above link!

Community
  • 1
  • 1
sconfluentus
  • 4,693
  • 1
  • 21
  • 40