I am trying to parse the child in a tree identify by an ID. This is how my XML looks like:
<Reporte xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<nombre>PML</nombre>
<proceso>MDA</proceso>
<sistema>BCS</sistema>
<area>PÚBLICA</area>
<Resultados>
<Nodo>
<clv_nodo>07CAB-115</clv_nodo>
<Valores>
<Valor>
<fecha>2017-03-15</fecha>
<hora>1</hora>
<pml>1688.02</pml>
<pml_ene>1638.38</pml_ene>
<pml_per>49.64</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>2</hora>
<pml>1446.18</pml>
<pml_ene>1405.81</pml_ene>
<pml_per>40.36</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>3</hora>
<pml>1389.31</pml>
<pml_ene>1351.85</pml_ene>
<pml_per>37.46</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>4</hora>
<pml>1337.1</pml>
<pml_ene>1301.93</pml_ene>
<pml_per>35.17</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>5</hora>
<pml>1532.75</pml>
<pml_ene>1492.39</pml_ene>
<pml_per>40.36</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>6</hora>
<pml>1729.85</pml>
<pml_ene>1683.15</pml_ene>
<pml_per>46.71</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>7</hora>
<pml>1698.2</pml>
<pml_ene>1650.29</pml_ene>
<pml_per>47.92</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>8</hora>
<pml>1700.84</pml>
<pml_ene>1649.62</pml_ene>
<pml_per>51.23</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>9</hora>
<pml>1708.53</pml>
<pml_ene>1652.13</pml_ene>
<pml_per>56.4</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>10</hora>
<pml>1798.48</pml>
<pml_ene>1735.19</pml_ene>
<pml_per>63.29</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>11</hora>
<pml>1656.64</pml>
<pml_ene>1595.8</pml_ene>
<pml_per>60.84</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>12</hora>
<pml>1712.9</pml>
<pml_ene>1648.41</pml_ene>
<pml_per>64.48</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>13</hora>
<pml>1787.72</pml>
<pml_ene>1719.13</pml_ene>
<pml_per>68.59</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>14</hora>
<pml>1851.01</pml>
<pml_ene>1779.59</pml_ene>
<pml_per>71.43</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>15</hora>
<pml>1950.51</pml>
<pml_ene>1873.83</pml_ene>
<pml_per>76.67</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>16</hora>
<pml>1661.94</pml>
<pml_ene>1595.87</pml_ene>
<pml_per>66.07</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>17</hora>
<pml>1740.8</pml>
<pml_ene>1671.24</pml_ene>
<pml_per>69.56</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>18</hora>
<pml>1895.51</pml>
<pml_ene>1820.19</pml_ene>
<pml_per>75.32</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>19</hora>
<pml>2074.18</pml>
<pml_ene>1990.16</pml_ene>
<pml_per>84.02</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>20</hora>
<pml>1959.91</pml>
<pml_ene>1878.22</pml_ene>
<pml_per>81.7</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>21</hora>
<pml>1791.66</pml>
<pml_ene>1719.24</pml_ene>
<pml_per>72.43</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>22</hora>
<pml>1986.59</pml>
<pml_ene>1909.79</pml_ene>
<pml_per>76.8</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>23</hora>
<pml>1709.51</pml>
<pml_ene>1648.21</pml_ene>
<pml_per>61.29</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>24</hora>
<pml>1539.04</pml>
<pml_ene>1488.58</pml_ene>
<pml_per>50.47</pml_per>
<pml_cng>0</pml_cng>
</Valor>
</Valores>
</Nodo>
<Nodo>
<clv_nodo>07BLE-115</clv_nodo>
<Valores>
<Valor>
<fecha>2017-03-15</fecha>
<hora>1</hora>
<pml>1646.19</pml>
<pml_ene>1638.38</pml_ene>
<pml_per>7.81</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>2</hora>
<pml>1413.36</pml>
<pml_ene>1405.81</pml_ene>
<pml_per>7.55</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>3</hora>
<pml>1358.96</pml>
<pml_ene>1351.85</pml_ene>
<pml_per>7.11</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>4</hora>
<pml>1308.71</pml>
<pml_ene>1301.93</pml_ene>
<pml_per>6.77</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>5</hora>
<pml>1499.65</pml>
<pml_ene>1492.39</pml_ene>
<pml_per>7.27</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>6</hora>
<pml>1690.93</pml>
<pml_ene>1683.15</pml_ene>
<pml_per>7.78</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>7</hora>
<pml>1659.78</pml>
<pml_ene>1650.29</pml_ene>
<pml_per>9.49</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>8</hora>
<pml>1661.41</pml>
<pml_ene>1649.62</pml_ene>
<pml_per>11.79</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>9</hora>
<pml>1662.14</pml>
<pml_ene>1652.13</pml_ene>
<pml_per>10</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>10</hora>
<pml>1744.71</pml>
<pml_ene>1735.19</pml_ene>
<pml_per>9.52</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>11</hora>
<pml>1604.34</pml>
<pml_ene>1595.8</pml_ene>
<pml_per>8.54</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>12</hora>
<pml>1657.17</pml>
<pml_ene>1648.41</pml_ene>
<pml_per>8.76</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>13</hora>
<pml>1728.18</pml>
<pml_ene>1719.13</pml_ene>
<pml_per>9.05</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>14</hora>
<pml>1789.15</pml>
<pml_ene>1779.59</pml_ene>
<pml_per>9.56</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>15</hora>
<pml>1883.75</pml>
<pml_ene>1873.83</pml_ene>
<pml_per>9.92</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>16</hora>
<pml>1599.54</pml>
<pml_ene>1595.87</pml_ene>
<pml_per>3.67</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>17</hora>
<pml>1674.73</pml>
<pml_ene>1671.24</pml_ene>
<pml_per>3.49</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>18</hora>
<pml>1823.54</pml>
<pml_ene>1820.19</pml_ene>
<pml_per>3.35</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>19</hora>
<pml>1992.57</pml>
<pml_ene>1990.16</pml_ene>
<pml_per>2.41</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>20</hora>
<pml>1885.48</pml>
<pml_ene>1878.22</pml_ene>
<pml_per>7.26</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>21</hora>
<pml>1726.7</pml>
<pml_ene>1719.24</pml_ene>
<pml_per>7.46</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>22</hora>
<pml>1914.07</pml>
<pml_ene>1909.79</pml_ene>
<pml_per>4.28</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>23</hora>
<pml>1653.54</pml>
<pml_ene>1648.21</pml_ene>
<pml_per>5.32</pml_per>
<pml_cng>0</pml_cng>
</Valor>
<Valor>
<fecha>2017-03-15</fecha>
<hora>24</hora>
<pml>1495.33</pml>
<pml_ene>1488.58</pml_ene>
<pml_per>6.75</pml_per>
<pml_cng>0</pml_cng>
</Valor>
</Valores>
</Nodo>
</Resultados>
<status>OK</status>
</Reporte>
I am trying to parse "Nodo" data by node index and expand it to the number of its child "Valor" elements, then column bind with "Valor" data.
I am literally using the description from this post:
[R XML - combining parent and child nodes into data frame
So far my code looks like this:
library(XML)
library(plyr)
library(purrr)
api <- function(path) {
url1 <- modify_url("https://ws01.cenace.gob.mx", port = "8082", path = path)
GET(url1)
}
resp <- api("/SWPML/SIM/BCS/MDA/07CAB-115,07BLE-115/2017/03/15/2017/03/15/XML")
url1 <- xmlParse(resp)
mtg_num <- length(xpathSApply(url1, "//Nodo"))
#I am using same names as example
#meeting_list is a 0X0 list
meeting_list <- lapply(seq(mtg_num), function(i) {
races_num <- length(xpathSApply(url1, sprintf("//Resultados[%s]/Valores", i)))
data.frame(
meeting_id = rep(xpathSApply(url1, sprintf("//clv_nodo", i)), races_num)
)
})
final_df <- cbind(do.call(rbind, meeting_list),
xmlToDataFrame(nodes = getNodeSet(url1, "//Valores/Valor")),
XML:::xmlAttrsToDataFrame(getNodeSet(url1, "//Valores/Valor")))
Since meeting_list is 0X0 when I try to bind it to "Valor" it gives me an error:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 0, 48
In addition: Warning message:
In combineNamedVectors(lapply(doc, xmlAttrs), attrs, omit, ...) :
no elements to combine across records
How can I possible parse correctly "clv_nodo" and expand it to the 48 values in "Valores".
My desire data frame looks like this:
clv_nodo fecha hora pml pml_ene pml_per pml_cng
1 07CAB-115 2017-03-15 1 1688.02 1638.38 49.64 0
2 07CAB-115 2017-03-15 2 1446.18 1405.81 40.36 0
3 07CAB-115 2017-03-15 3 1389.31 1351.85 37.46 0
4 07CAB-115 2017-03-15 4 1337.1 1301.93 35.17 0
5 07CAB-115 2017-03-15 5 1532.75 1492.39 40.36 0
6 07CAB-115 2017-03-15 6 1729.85 1683.15 46.71 0
7 07CAB-115 2017-03-15 7 1698.2 1650.29 47.92 0
8 07CAB-115 2017-03-15 8 1700.84 1649.62 51.23 0
9 07CAB-115 2017-03-15 9 1708.53 1652.13 56.4 0
10 07CAB-115 2017-03-15 10 1798.48 1735.19 63.29 0
11 07CAB-115 2017-03-15 11 1656.64 1595.8 60.84 0
12 07CAB-115 2017-03-15 12 1712.9 1648.41 64.48 0
13 07CAB-115 2017-03-15 13 1787.72 1719.13 68.59 0
14 07CAB-115 2017-03-15 14 1851.01 1779.59 71.43 0
15 07CAB-115 2017-03-15 15 1950.51 1873.83 76.67 0
16 07CAB-115 2017-03-15 16 1661.94 1595.87 66.07 0
17 07CAB-115 2017-03-15 17 1740.8 1671.24 69.56 0
18 07CAB-115 2017-03-15 18 1895.51 1820.19 75.32 0
19 07CAB-115 2017-03-15 19 2074.18 1990.16 84.02 0
20 07CAB-115 2017-03-15 20 1959.91 1878.22 81.7 0
21 07CAB-115 2017-03-15 21 1791.66 1719.24 72.43 0
22 07CAB-115 2017-03-15 22 1986.59 1909.79 76.8 0
23 07CAB-115 2017-03-15 23 1709.51 1648.21 61.29 0
24 07CAB-115 2017-03-15 24 1539.04 1488.58 50.47 0
25 07BLE-115 2017-03-15 1 1646.19 1638.38 7.81 0
26 07BLE-115 2017-03-15 2 1413.36 1405.81 7.55 0
27 07BLE-115 2017-03-15 3 1358.96 1351.85 7.11 0
28 07BLE-115 2017-03-15 4 1308.71 1301.93 6.77 0
29 07BLE-115 2017-03-15 5 1499.65 1492.39 7.27 0
30 07BLE-115 2017-03-15 6 1690.93 1683.15 7.78 0
31 07BLE-115 2017-03-15 7 1659.78 1650.29 9.49 0
32 07BLE-115 2017-03-15 8 1661.41 1649.62 11.79 0
33 07BLE-115 2017-03-15 9 1662.14 1652.13 10 0
34 07BLE-115 2017-03-15 10 1744.71 1735.19 9.52 0
35 07BLE-115 2017-03-15 11 1604.34 1595.8 8.54 0
36 07BLE-115 2017-03-15 12 1657.17 1648.41 8.76 0
37 07BLE-115 2017-03-15 13 1728.18 1719.13 9.05 0
38 07BLE-115 2017-03-15 14 1789.15 1779.59 9.56 0
39 07BLE-115 2017-03-15 15 1883.75 1873.83 9.92 0
40 07BLE-115 2017-03-15 16 1599.54 1595.87 3.67 0
41 07BLE-115 2017-03-15 17 1674.73 1671.24 3.49 0
42 07BLE-115 2017-03-15 18 1823.54 1820.19 3.35 0
43 07BLE-115 2017-03-15 19 1992.57 1990.16 2.41 0
44 07BLE-115 2017-03-15 20 1885.48 1878.22 7.26 0
45 07BLE-115 2017-03-15 21 1726.7 1719.24 7.46 0
46 07BLE-115 2017-03-15 22 1914.07 1909.79 4.28 0
47 07BLE-115 2017-03-15 23 1653.54 1648.21 5.32 0
48 07BLE-115 2017-03-15 24 1495.33 1488.58 6.75 0
UPDATE Final answer
Dave2e's code worked perfectly for the two IDS that I am parsing in this example (07CAB-115, 07BLE-115) but I need to parse 2500 different IDS. Some of them have empty nodes at a "Valores" node. For example for this one:
<Nodo>
<clv_nodo>07ASJ-115</clv_nodo>
<Valores>
so when I ran Dave2e code I got:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 1, 0
This is because in the last part I get a nested list with the last one being 0x0 df<- bind_rows(valornodes)
then I was binding the ID with this empty list. The solution, as Dave2e suggested, was filtering clvs
by the empty nodes in Valores
. In this way the match between the nested list and the list of ID's is correct. This is the final code:
api <- function(path) {
url1 <- modify_url("https://ws01.cenace.gob.mx", port = "8082", path = path)
GET(url1)
}
resp <- api("/SWPML/SIM/BCS/MDA/07CAB-115,07BLE-115,07CAD-115,07ASJ-115/2017/03/26/2017/04/01/XML")
z <- read_xml(resp)
parents <-xml_find_all(z, ".//Nodo")
dfs<-lapply(parents, function(node){
#find clvs name
clvs <-xml_find_all(node, ".//clv_nodo") %>% xml_text()
#Find all children
valors <- node %>% xml_find_all(".//Valor")
#Find all children in Valores
val <- node %>% xml_find_all(".//Valores")
#Filter clvs by empty nodes in Valores
clvs <- clvs[xml_length(val)>0]
#remove cases where the valors nodes have no children nodes
valors <- valors[xml_length(valors)>0]
valornodes <- lapply(valors, function(node){
#get values and names
values <- xml_children(node) %>% xml_text()
names <- xml_children(node) %>% xml_name()
#make data.frame and name the columns
tempdf<- data.frame(t(values), stringsAsFactors = FALSE)
names(tempdf) <- names
tempdf
})
#made data frame with all of results
df<- bind_rows(valornodes)
df<- cbind(clvs,df)
df
})
q <- do.call(rbind.data.frame, dfs)