For tl;dr I have a simple question at the bottom:
I'm trying to turn XML files into use-able tables in R.
<toes copyright='(C)version='1.1'> <generated date='2017-01-21
07:45:04'timestamp='1485006304'/>
<description> Active TOE vehicle levels and adjustments for the current
campaign up to the RDP cycle in progress. c0 = the cycle 0 capacity, adj
= comma-separated list of cycle:capacity adjustments, cur = current
capacity </description>
<defaults><def att='adj' value=''/></defaults>
<r toe="deairfor" veh="22" c0="30" cur="30"/>
<r toe="deairfor" veh="23" c0="40" cur="20" adj="1:35,2:20"/>
<r toe="deairfor" veh="26" c0="2" cur="2" adj="2:10,3:30"/>
</toes>
My intended format is this:
"TOE" "Veh" "c0" "cur" "adj1" "adj2" "adj3"
"deairfor" 22 30 30 NA NA NA
"deairfor" 23 40 20 35 20 NA
"deairfor" 26 2 2 NA 10 30
I have zero experience with importing XML files but I think this file is not formatted properly as I haven't encountered any XML example with data inside the tags like in < r toe "...data..."/>. I have been able to extract the data with the following:
library(XML)
source <- "http://wiretap.wwiionline.com/xml/toes.sheet.xml"
xmlfile <- xmlTreeParse(source, useInternalNodes = TRUE)
nodes <- getNodeSet(xmlfile, "/toes//r")
Df1 <- NULL
for(i in 1:length(nodes)) {
Df1 <- t(xmlToList(nodes[[i]]))
Df2 <- smartbind(Df2,Df1[1,])
}
I was only able to extract 1 row at a time, so I use the later code to bind these together. I needed the df1/2 otherwise it would error out on i = 1. Probably much easier in a different way but I couldn't get it working.
This leaves me with a dataframe Df2, with all the variables as "factor" (why?)
"TOE" "Veh" "c0" "cur" "adj"
deairfor 22 30 30 NA
deairfor 23 40 20 35 1:35,2:20
deairfor 26 2 2 2 2:10,3:30
So the difficulty lies now in this "adj" column. I can separate it with the following:
Df2 <- separate(data = Df2, col = adj, into = c("adj1", adj2","adj3"), sep = "\\,")
Df2 <- separate(data = Df2, col = adj1, into = c("adj1","adj1value"), sep = "\\:")
Df2 <- separate(data = Df2, col = adj2, into = c("adj2","adj2value"), sep = "\\:")
Df2 <- separate(data = Df2, col = adj3, into = c("adj3","adj3value"), sep = "\\:")
But the cells are not in the right columns. Df2 now is as below:
"TOE" "Veh" "c0" "cur" "adj1" "adj1value" "adj2" "adj2value" "adj3" "adj3value"
deairfor 22 30 30 NA NA NA NA NA NA
deairfor 23 40 20 1 35 2 20 NA NA
deairfor 26 2 2 2 10 3 30 NA NA
While this last row needs to be: (once the adj1values are in the proper columns we can also drop adj1/adj2/adj3)
deairfor 26 2 2 NA NA 2 10 3 30
I've tried numerous ways to move these cells to the right, but constantly get errors, e.g.: (the adj* columns are characters after separating hence the "1")
Df2$adj3[Df2$adj1 == "1"] <- Df2$adj2
Df2$adj3value[Df2$adj1 == "1"] <- Df2$adj2value
"NAs are not allowed in subscripted assignments"
So question: How do I move these values to the proper column?
"TOE" "Veh" "c0" "cur" "adj"
deairfor 26 2 2 2:10,3:30
Should become
"TOE" "Veh" "c0" "cur" "adj1" "adj2" "adj3"
deairfor 26 2 2 NA 10 30
Bonus question: I get the feeling I need to use many lines because the XML import at the beginning isn't quite optimal, anyway to do that better given the goal that I have?