0

I have an XML database, that have structure like this:

<Record>
    <Id>
      {text}
         <AdresDoDoreczen>
                <Miejscowosc>
                  {text}
                <Budynek>
                  {text}
                <KodPocztowy>
                  {text}
                <Poczta>
                  {text}
                <Gmina>
                  {text}
                <Powiat>
                  {text}
                <Wojewodztwo>
                  {text}

I use following code:

require(xml2)

file <- "file.xml"
doc <- read_xml(file, useInternalNodes = TRUE) 
column1 = xml_text(xml_find_all(doc, '//AdresDoDoreczen/Miejscowosc'))

Applying this code to other nodes (Budynek, Powiat, Gmina) gives me a number of vectors/columns, that I can merge with matrix() and save it as .csv.

Unfortunately, few of those records are missing some nodes, so xml_find_all(doc, '//AdresDoDoreczen/Gmina') gives me not 100 records, but 95. It works fine when "Gmina" node is empty, but when it doesnt exist at all - I have a problem, because then whole Vector matrix is missaligned.

Any idea how to deal with those?

m_slaav
  • 17
  • 1
  • 8
  • Similar to this question: https://stackoverflow.com/questions/61541601/scraping-and-extracting-xml-sitemap-elements-using-r-and-rvest/61545930#61545930 – Dave2e May 25 '20 at 13:26
  • I undelete my answer inside the link Dave2e provided, maybe you could use. – FrakTool May 25 '20 at 13:35
  • Combining Dave's answer with https://github.com/r-lib/xml2/issues/237 I think gave great results, but I'd have to analyze them and to be honest - learn what I actually did there. I saw that post before, but "tibble" scared me, but it turned to be quite friendly after all. – m_slaav May 25 '20 at 14:34

0 Answers0