How can I parse and expand correctly my XML ID in R

Question

I am trying to parse the child in a tree identify by an ID. This is how my XML looks like:

       <Reporte xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <nombre>PML</nombre>
  <proceso>MDA</proceso>
  <sistema>BCS</sistema>
  <area>PÚBLICA</area>
  <Resultados>
    <Nodo>
      <clv_nodo>07CAB-115</clv_nodo>
      <Valores>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>1</hora>
          <pml>1688.02</pml>
          <pml_ene>1638.38</pml_ene>
          <pml_per>49.64</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>2</hora>
          <pml>1446.18</pml>
          <pml_ene>1405.81</pml_ene>
          <pml_per>40.36</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>3</hora>
          <pml>1389.31</pml>
          <pml_ene>1351.85</pml_ene>
          <pml_per>37.46</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>4</hora>
          <pml>1337.1</pml>
          <pml_ene>1301.93</pml_ene>
          <pml_per>35.17</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>5</hora>
          <pml>1532.75</pml>
          <pml_ene>1492.39</pml_ene>
          <pml_per>40.36</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>6</hora>
          <pml>1729.85</pml>
          <pml_ene>1683.15</pml_ene>
          <pml_per>46.71</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>7</hora>
          <pml>1698.2</pml>
          <pml_ene>1650.29</pml_ene>
          <pml_per>47.92</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>8</hora>
          <pml>1700.84</pml>
          <pml_ene>1649.62</pml_ene>
          <pml_per>51.23</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>9</hora>
          <pml>1708.53</pml>
          <pml_ene>1652.13</pml_ene>
          <pml_per>56.4</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>10</hora>
          <pml>1798.48</pml>
          <pml_ene>1735.19</pml_ene>
          <pml_per>63.29</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>11</hora>
          <pml>1656.64</pml>
          <pml_ene>1595.8</pml_ene>
          <pml_per>60.84</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>12</hora>
          <pml>1712.9</pml>
          <pml_ene>1648.41</pml_ene>
          <pml_per>64.48</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>13</hora>
          <pml>1787.72</pml>
          <pml_ene>1719.13</pml_ene>
          <pml_per>68.59</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>14</hora>
          <pml>1851.01</pml>
          <pml_ene>1779.59</pml_ene>
          <pml_per>71.43</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>15</hora>
          <pml>1950.51</pml>
          <pml_ene>1873.83</pml_ene>
          <pml_per>76.67</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>16</hora>
          <pml>1661.94</pml>
          <pml_ene>1595.87</pml_ene>
          <pml_per>66.07</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>17</hora>
          <pml>1740.8</pml>
          <pml_ene>1671.24</pml_ene>
          <pml_per>69.56</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>18</hora>
          <pml>1895.51</pml>
          <pml_ene>1820.19</pml_ene>
          <pml_per>75.32</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>19</hora>
          <pml>2074.18</pml>
          <pml_ene>1990.16</pml_ene>
          <pml_per>84.02</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>20</hora>
          <pml>1959.91</pml>
          <pml_ene>1878.22</pml_ene>
          <pml_per>81.7</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>21</hora>
          <pml>1791.66</pml>
          <pml_ene>1719.24</pml_ene>
          <pml_per>72.43</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>22</hora>
          <pml>1986.59</pml>
          <pml_ene>1909.79</pml_ene>
          <pml_per>76.8</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>23</hora>
          <pml>1709.51</pml>
          <pml_ene>1648.21</pml_ene>
          <pml_per>61.29</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>24</hora>
          <pml>1539.04</pml>
          <pml_ene>1488.58</pml_ene>
          <pml_per>50.47</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
      </Valores>
    </Nodo>
    <Nodo>
      <clv_nodo>07BLE-115</clv_nodo>
      <Valores>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>1</hora>
          <pml>1646.19</pml>
          <pml_ene>1638.38</pml_ene>
          <pml_per>7.81</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>2</hora>
          <pml>1413.36</pml>
          <pml_ene>1405.81</pml_ene>
          <pml_per>7.55</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>3</hora>
          <pml>1358.96</pml>
          <pml_ene>1351.85</pml_ene>
          <pml_per>7.11</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>4</hora>
          <pml>1308.71</pml>
          <pml_ene>1301.93</pml_ene>
          <pml_per>6.77</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>5</hora>
          <pml>1499.65</pml>
          <pml_ene>1492.39</pml_ene>
          <pml_per>7.27</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>6</hora>
          <pml>1690.93</pml>
          <pml_ene>1683.15</pml_ene>
          <pml_per>7.78</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>7</hora>
          <pml>1659.78</pml>
          <pml_ene>1650.29</pml_ene>
          <pml_per>9.49</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>8</hora>
          <pml>1661.41</pml>
          <pml_ene>1649.62</pml_ene>
          <pml_per>11.79</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>9</hora>
          <pml>1662.14</pml>
          <pml_ene>1652.13</pml_ene>
          <pml_per>10</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>10</hora>
          <pml>1744.71</pml>
          <pml_ene>1735.19</pml_ene>
          <pml_per>9.52</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>11</hora>
          <pml>1604.34</pml>
          <pml_ene>1595.8</pml_ene>
          <pml_per>8.54</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>12</hora>
          <pml>1657.17</pml>
          <pml_ene>1648.41</pml_ene>
          <pml_per>8.76</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>13</hora>
          <pml>1728.18</pml>
          <pml_ene>1719.13</pml_ene>
          <pml_per>9.05</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>14</hora>
          <pml>1789.15</pml>
          <pml_ene>1779.59</pml_ene>
          <pml_per>9.56</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>15</hora>
          <pml>1883.75</pml>
          <pml_ene>1873.83</pml_ene>
          <pml_per>9.92</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>16</hora>
          <pml>1599.54</pml>
          <pml_ene>1595.87</pml_ene>
          <pml_per>3.67</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>17</hora>
          <pml>1674.73</pml>
          <pml_ene>1671.24</pml_ene>
          <pml_per>3.49</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>18</hora>
          <pml>1823.54</pml>
          <pml_ene>1820.19</pml_ene>
          <pml_per>3.35</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>19</hora>
          <pml>1992.57</pml>
          <pml_ene>1990.16</pml_ene>
          <pml_per>2.41</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>20</hora>
          <pml>1885.48</pml>
          <pml_ene>1878.22</pml_ene>
          <pml_per>7.26</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>21</hora>
          <pml>1726.7</pml>
          <pml_ene>1719.24</pml_ene>
          <pml_per>7.46</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>22</hora>
          <pml>1914.07</pml>
          <pml_ene>1909.79</pml_ene>
          <pml_per>4.28</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>23</hora>
          <pml>1653.54</pml>
          <pml_ene>1648.21</pml_ene>
          <pml_per>5.32</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
        <Valor>
          <fecha>2017-03-15</fecha>
          <hora>24</hora>
          <pml>1495.33</pml>
          <pml_ene>1488.58</pml_ene>
          <pml_per>6.75</pml_per>
          <pml_cng>0</pml_cng>
        </Valor>
      </Valores>
    </Nodo>
  </Resultados>
  <status>OK</status>
</Reporte>

I am trying to parse "Nodo" data by node index and expand it to the number of its child "Valor" elements, then column bind with "Valor" data.
I am literally using the description from this post: [R XML - combining parent and child nodes into data frame

So far my code looks like this:

library(XML)
library(plyr)
library(purrr)
api <- function(path) {
  url1 <- modify_url("https://ws01.cenace.gob.mx", port = "8082", path = path)
  GET(url1)
}
resp <- api("/SWPML/SIM/BCS/MDA/07CAB-115,07BLE-115/2017/03/15/2017/03/15/XML")
url1 <- xmlParse(resp)
mtg_num <- length(xpathSApply(url1, "//Nodo"))
#I am using same names as example
#meeting_list is a 0X0 list
meeting_list <- lapply(seq(mtg_num), function(i) {
  races_num <- length(xpathSApply(url1, sprintf("//Resultados[%s]/Valores", i)))

  data.frame(
    meeting_id = rep(xpathSApply(url1, sprintf("//clv_nodo", i)), races_num)
  )
})
final_df <- cbind(do.call(rbind, meeting_list),
                  xmlToDataFrame(nodes = getNodeSet(url1, "//Valores/Valor")),
                  XML:::xmlAttrsToDataFrame(getNodeSet(url1, "//Valores/Valor")))

Since meeting_list is 0X0 when I try to bind it to "Valor" it gives me an error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 0, 48
In addition: Warning message:
In combineNamedVectors(lapply(doc, xmlAttrs), attrs, omit, ...) :
  no elements to combine across records

How can I possible parse correctly "clv_nodo" and expand it to the 48 values in "Valores".

My desire data frame looks like this:

   clv_nodo       fecha hora     pml pml_ene pml_per pml_cng
1  07CAB-115 2017-03-15    1 1688.02 1638.38   49.64       0
2  07CAB-115 2017-03-15    2 1446.18 1405.81   40.36       0
3  07CAB-115 2017-03-15    3 1389.31 1351.85   37.46       0
4  07CAB-115 2017-03-15    4  1337.1 1301.93   35.17       0
5  07CAB-115 2017-03-15    5 1532.75 1492.39   40.36       0
6  07CAB-115 2017-03-15    6 1729.85 1683.15   46.71       0
7  07CAB-115 2017-03-15    7  1698.2 1650.29   47.92       0
8  07CAB-115 2017-03-15    8 1700.84 1649.62   51.23       0
9  07CAB-115 2017-03-15    9 1708.53 1652.13    56.4       0
10 07CAB-115 2017-03-15   10 1798.48 1735.19   63.29       0
11 07CAB-115 2017-03-15   11 1656.64  1595.8   60.84       0
12 07CAB-115 2017-03-15   12  1712.9 1648.41   64.48       0
13 07CAB-115 2017-03-15   13 1787.72 1719.13   68.59       0
14 07CAB-115 2017-03-15   14 1851.01 1779.59   71.43       0
15 07CAB-115 2017-03-15   15 1950.51 1873.83   76.67       0
16 07CAB-115 2017-03-15   16 1661.94 1595.87   66.07       0
17 07CAB-115 2017-03-15   17  1740.8 1671.24   69.56       0
18 07CAB-115 2017-03-15   18 1895.51 1820.19   75.32       0
19 07CAB-115 2017-03-15   19 2074.18 1990.16   84.02       0
20 07CAB-115 2017-03-15   20 1959.91 1878.22    81.7       0
21 07CAB-115 2017-03-15   21 1791.66 1719.24   72.43       0
22 07CAB-115 2017-03-15   22 1986.59 1909.79    76.8       0
23 07CAB-115 2017-03-15   23 1709.51 1648.21   61.29       0
24 07CAB-115 2017-03-15   24 1539.04 1488.58   50.47       0
25 07BLE-115 2017-03-15    1 1646.19 1638.38    7.81       0
26 07BLE-115 2017-03-15    2 1413.36 1405.81    7.55       0
27 07BLE-115 2017-03-15    3 1358.96 1351.85    7.11       0
28 07BLE-115 2017-03-15    4 1308.71 1301.93    6.77       0
29 07BLE-115 2017-03-15    5 1499.65 1492.39    7.27       0
30 07BLE-115 2017-03-15    6 1690.93 1683.15    7.78       0
31 07BLE-115 2017-03-15    7 1659.78 1650.29    9.49       0
32 07BLE-115 2017-03-15    8 1661.41 1649.62   11.79       0
33 07BLE-115 2017-03-15    9 1662.14 1652.13      10       0
34 07BLE-115 2017-03-15   10 1744.71 1735.19    9.52       0
35 07BLE-115 2017-03-15   11 1604.34  1595.8    8.54       0
36 07BLE-115 2017-03-15   12 1657.17 1648.41    8.76       0
37 07BLE-115 2017-03-15   13 1728.18 1719.13    9.05       0
38 07BLE-115 2017-03-15   14 1789.15 1779.59    9.56       0
39 07BLE-115 2017-03-15   15 1883.75 1873.83    9.92       0
40 07BLE-115 2017-03-15   16 1599.54 1595.87    3.67       0
41 07BLE-115 2017-03-15   17 1674.73 1671.24    3.49       0
42 07BLE-115 2017-03-15   18 1823.54 1820.19    3.35       0
43 07BLE-115 2017-03-15   19 1992.57 1990.16    2.41       0
44 07BLE-115 2017-03-15   20 1885.48 1878.22    7.26       0
45 07BLE-115 2017-03-15   21  1726.7 1719.24    7.46       0
46 07BLE-115 2017-03-15   22 1914.07 1909.79    4.28       0
47 07BLE-115 2017-03-15   23 1653.54 1648.21    5.32       0
48 07BLE-115 2017-03-15   24 1495.33 1488.58    6.75       0

UPDATE Final answer

Dave2e's code worked perfectly for the two IDS that I am parsing in this example (07CAB-115, 07BLE-115) but I need to parse 2500 different IDS. Some of them have empty nodes at a "Valores" node. For example for this one:

<Nodo>
      <clv_nodo>07ASJ-115</clv_nodo>
      <Valores>

so when I ran Dave2e code I got:

 Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 1, 0

This is because in the last part I get a nested list with the last one being 0x0 df<- bind_rows(valornodes) then I was binding the ID with this empty list. The solution, as Dave2e suggested, was filtering clvs by the empty nodes in Valores. In this way the match between the nested list and the list of ID's is correct. This is the final code:

 api <- function(path) {
      url1 <- modify_url("https://ws01.cenace.gob.mx", port = "8082", path = path)
      GET(url1)
    }
    resp <- api("/SWPML/SIM/BCS/MDA/07CAB-115,07BLE-115,07CAD-115,07ASJ-115/2017/03/26/2017/04/01/XML")
z <- read_xml(resp)
parents <-xml_find_all(z, ".//Nodo")
dfs<-lapply(parents, function(node){
  #find clvs name
  clvs <-xml_find_all(node, ".//clv_nodo") %>% xml_text()
  #Find all children
  valors <- node %>% xml_find_all(".//Valor")
#Find all children in Valores  
val <- node %>% xml_find_all(".//Valores")

#Filter clvs by empty nodes in Valores  
clvs <- clvs[xml_length(val)>0]


  #remove cases where the valors nodes have no children nodes
  valors <- valors[xml_length(valors)>0]

  valornodes <- lapply(valors, function(node){
    #get values and names
    values <- xml_children(node) %>% xml_text()
    names <- xml_children(node) %>% xml_name()

    #make data.frame and name the columns
    tempdf<- data.frame(t(values), stringsAsFactors = FALSE)
    names(tempdf) <- names
    tempdf
  })
  #made data frame with all of results
  df<- bind_rows(valornodes)
  df<- cbind(clvs,df)
df

})

q <- do.call(rbind.data.frame, dfs)

score 2 · Answer 1 · answered May 14 '20 at 15:36

I have this code i write some time ago to check all the XML in a file and collect specific nodes of a pattern of XML, with a little tweak you can use something maybe.

library("xml2")
library("XML")

setwd("/xml")
dir <- dir()
tabela=matrix(NA,nrow=length(a),ncol=1)

  for(i in 1:length(dir)){

  visitNode <- function(node) {#Recursive Function to visit the XML tree (depth first)
    if (is.null(node)) {#leaf node reached. Turn back
      return()
    }
    print(paste("Node: ", xmlName(node)))
      num.children = xmlSize(node)

    if(num.children == 0 ) {# Add your code to process the leaf node here
      print(      paste("   ", xmlValue(node)))
    }
    if (num.children > 0){#Go one level deeper
      for (i in 1 : num.children) {
        visitNode(node[[i]][["NFe"]]) #the i-th child of node
      }
    }

  }
  xmlfile <- dir[i]
  xtree <- xmlInternalTreeParse(xmlfile)
  root <- xmlRoot(xtree)
  dataxml <- visitNode(root)
  dataxml <- xmlToList(root)


  df<- as.data.frame(matrix(unlist(dataxml$NFe$infNFe$infAdic$infCpl), nrow=length(dataxml$NFe$infNFe$infAdic$infCpl),byrow=TRUE))

Hi FrakTool, I have been trying to run this code since my data has some empty nodes as Dave2e warned me. I can't figure out what "NFe". Also I need to process two leaf one is clv_nodo and the other one is Valor. Do you have any possible hint — Lilia, May 15 '20 at 12:05
NFe is a product of XML structure on my .xml file, just remove `[["NFe"]]` part in `visitNode(node[[i]][["NFe"]])` on my case a just can get all nodes of NFe if I access product of xml via `i` in node funciton, you can create a list to store the names of the nodes you wand to use, and then aply this list with node names using the function and lapply to store all information you want. that will do the trick. — FrakTool, May 15 '20 at 16:32

Dave2e · Accepted Answer · 2020-05-16T03:59:39.963

Here is a solution using the xml2 package.
This solution assumes very Valor node has the same subnodes. If that is not the case then this solution will fail.

library(xml2)
library(dplyr)

#find parent nodes
parents <-xml_find_all(page, ".//Nodo")

#parse each child
dfs<-lapply(parents, function(node){
  #find clvs name
  clvs <-xml_find_all(node, ".//clv_nodo") %>% xml_text()
  #Find all children
  valors <- node %>% xml_find_all(".//Valor")

  #remove cases where the valors nodes have no children nodes
   valors <- valors[xml_length(valors)>0]

  #get values
  values <- xml_children(valors) %>% xml_text()

  #made data frame with results (assumes no missing child nodes)
  df<- data.frame(matrix(values, ncol=6, byrow=TRUE), stringsAsFactors = FALSE)
  #get node names and rename columns
  names(df)<-node %>% xml_find_first(".//Valor")  %>%  xml_children() %>% xml_name()
  df<- cbind(clvs, df)   #bind on clvs name

  df
})

#Make find answer
answer<-bind_rows(dfs)
head(answer)
#        clvs      fecha hora     pml pml_ene pml_per pml_cng
# 1 07CAB-115 2017-03-15    1 1688.02 1638.38   49.64       0
# 2 07CAB-115 2017-03-15    2 1446.18 1405.81   40.36       0
# 3 07CAB-115 2017-03-15    3 1389.31 1351.85   37.46       0
# 4 07CAB-115 2017-03-15    4  1337.1 1301.93   35.17       0
# 5 07CAB-115 2017-03-15    5 1532.75 1492.39   40.36       0
# 6 07CAB-115 2017-03-15    6 1729.85 1683.15   46.71       0

Update
If the number of sub nodes to Valor changes or is not consistent. Then substitute in the below script in for the lapply above. It does become a loop within a loop so performance will be impacted.

#parse each child
dfs<-lapply(parents, function(node){
  #find clvs name
  clvs <-xml_find_all(node, ".//clv_nodo") %>% xml_text()
  #Find all children
  valors <- node %>% xml_find_all(".//Valor")

  #remove cases where the valors nodes have no children nodes
   valors <- valors[xml_length(valors)>0]

  valornodes <- lapply(valors, function(node){
    #get values and names
    values <- xml_children(node) %>% xml_text()
    names <- xml_children(node) %>% xml_name()

    #make data.frame and name the columns
    tempdf<- data.frame(t(values), stringsAsFactors = FALSE)
    names(tempdf) <- names
    tempdf
  })
  #made data frame with all of results
  df<- bind_rows(valornodes)

  df<- cbind(clvs, df)
  df
})

@Lilia, The function `xml_length` returns the count of the number of children nodes, you can use this to filter out the zero length nodes. I added `valors[xml_length(valors)>0]` to remove cases where there are no child nodes to the vadors node. It should work now. — Dave2e, May 16 '20 at 03:59
I ran the code and I am getting the same error for one day but if I extend the period to 7 days (I will be running the code for 4 years period) I get a list instead of a data frame, please check my update in the original post — Lilia, May 19 '20 at 14:44
This will return the `parents %>% xml_find_all(".//Valores") %>% xml_length()` of children nodes in Valores, then you can filter out the "Nodo" without will empty nodes. — Dave2e, May 19 '20 at 14:59
Thank you @Dave2e finally I got it! I updated my post with the solution. You were right `xml_length()` filtered the empty nodes. — Lilia, May 20 '20 at 16:41

How can I parse and expand correctly my XML ID in R

2 Answers2