1

I am investigating creating XML trees based on R objects using the XML package.

One thing I would like to do is extract information from data frames like the following:

frame2
rules     support confidence      lift
1      1 0.010230179  1.0000000 78.200000
2      2 0.010230179  0.8000000 78.200000
3      3 0.010230179  1.0000000 65.166667
4      4 0.010230179  0.6666667 65.166667
5      5 0.012787724  0.8333333 54.305556
6      6 0.012787724  0.8333333 54.305556
7      7 0.010230179  0.6666667 26.066667
8      8 0.010230179  0.4000000 26.066667
9      9 0.007672634  0.6000000 26.066667
10    10 0.007672634  0.3333333 26.066667
11    11 0.007672634  0.6000000 21.327273
12    12 0.007672634  0.2727273 21.327273
13    13 0.007672634  0.4285714 16.757143
14    14 0.007672634  0.3000000 16.757143
15    15 0.010230179  0.6666667 26.066667
16    16 0.010230179  0.4000000 26.066667
17    17 0.007672634  0.3333333 10.861111
18    18 0.007672634  0.2500000 10.861111
19    19 0.007672634  0.3750000 13.329545
20    20 0.007672634  0.2727273 13.329545
21    21 0.007672634  0.3750000 11.278846
22    22 0.007672634  0.2307692 11.278846
23    23 0.007672634  0.3750000 18.328125
24    24 0.007672634  0.3750000 18.328125
25    25 0.007672634  0.4285714 13.964286
26    26 0.007672634  0.2500000 13.964286
27    27 0.007672634  0.4285714 11.171429
28    28 0.007672634  0.2000000 11.171429
29    29 0.007672634  0.3000000 11.730000
30    30 0.007672634  0.3000000 11.730000
31    31 0.007672634  0.2727273  8.886364
32    32 0.007672634  0.2500000  8.886364
33    33 0.007672634  0.3333333 10.861111
34    34 0.007672634  0.2500000 10.861111
35    35 0.007672634  0.3000000 11.730000
36    36 0.007672634  0.3000000 11.730000
37    37 0.007672634  0.3000000  9.775000
38    38 0.007672634  0.2500000  9.775000
39    39 0.007672634  0.2727273  8.202797
40    40 0.007672634  0.2307692  8.202797
41    41 0.007672634  0.2307692  8.202797
42    42 0.007672634  0.2727273  8.202797
43    43 0.007672634  0.2307692  6.015385
44    44 0.007672634  0.2000000  6.015385
45    45 0.010230179  0.8000000 31.280000
46    46 0.010230179  1.0000000 65.166667
47    47 0.010230179  1.0000000 65.166667

into XML trees like the following

root
  1
   support=0.010230179
   confidence=1.0000000 
   lift=78.200000
  /1
  2 
   support=0.010230179
   confidence=0.8000000
   lift=78.200000
  /2
  47
   support=0.010230179
   confidence=1.0000000
   lift=65.166667
  /47
/root

So far, I have been able to create the 47 child nodes using the following commands.

root<-newXMLNode("root")
sapply(frame2$rules,newXMLNode,parent=root)

However, I have been unable to add the elements support, confidence, and lift with their appropriate values.

Which leads me to asking the following 2 questions:

  1. How can I define either the elements or attributes support, confidence, and lift for each of the 47 child nodes?

  2. How can I fill in their respective values based off of the values present in frame2?

Thank You Very Much.

  • A related question: http://stackoverflow.com/questions/6256064/how-to-create-xml-from-r-objects-e-g-is-there-a-listtoxml-function – Thell Aug 20 '12 at 17:10

1 Answers1

1

Basic problems:

  • You want to loop over your data frame by row.

apply with a is your friend there:

> invisible(apply(df, MARGIN=1, print))
      rules     support  confidence        lift 
 1.00000000  0.01023018  1.00000000 78.20000000 
      rules     support  confidence        lift 
 2.00000000  0.01023018  0.80000000 78.20000000 
      rules     support  confidence        lift 
 3.00000000  0.01023018  1.00000000 65.16666700 
  • You want to convert these rows to XML elements (named after the rules attribute), with attributes for the other columns

newXMLNode has an attrs argument which takes a named vector to roduce attributes

> newXMLNode(as.character(df[1,"rules"]), attrs=df[1,])    
<1 rules="1" support="0.010230179" confidence="1" lift="78.2"/>

To precisely answer the question, we'll also have to get rid of your rules attribute by subsetting, but that doesn't make me very happy -

> newXMLNode(as.character(df[1,"rules"]), attrs=df[1,-1])    
<1 support="0.010230179" confidence="1" lift="78.2"/>

I'd suggest not creating an XML schema with unpredictable element names anyway (because it's usually harder to parse/validate in recipient code), it's much better practice to encode this like

> newXMLNode("observation", attrs=df[1,])    
<observation rule="1" support="0.010230179" confidence="1" lift="78.2"/>

Combining the above two and adding a root node:

> children <- apply(df, MARGIN=1, function(row) newXMLNode("observation", attrs=row))
> root <- newXMLNode("frame2", .children = children)

> root
<frame2>
  <observation rules="1" support="0.010230179" confidence="1" lift="78.2"/>
  <observation rules="2" support="0.010230179" confidence="0.8" lift="78.2"/>
  <observation rules="3" support="0.010230179" confidence="1" lift="65.166667"/>
</frame2> 
themel
  • 8,825
  • 2
  • 32
  • 31