Import xml to R

Question

I have election results data in xml files I am trying to import into R. This is my first time ever working with xml files but I haven't the foggiest idea what is up with the .xls version of the data I can download so I'm attempting to work with the xml.

There isn't a direct link to the xml file, but it can be accessed here https://results.enr.clarityelections.com/IL/Bloomington/109017/web.276013/#/summary on the right side by scrolling down a bit to "Reports" and downloading "Detail XML".

I've been trying to use xml2 to get it into a data frame. I can read_xml then turn it into a list but after that my attempts have given me only a variety of errors or more lists with a lot of NULLs. It's possible the weirdness is being caused by the xml file itself, but I don't know enough about them to know if that is the case.

Does it have to be an xml? Not a text file? I see there is a text file option. Just curious why it has to be xml — neuron, Nov 20 '21 at 03:51
Also, all the data shows up in the files like it does on the site. There is a section for each position (i.e. mayor, city council, township assessor, etc.) with gaps between each election result. R won't like this format. Ontop of all that, each section is structured a bit differently. With names of people who ran for office and who one. R needs a more or less constant column structure or it will have trouble reading in the file. If you download the text file and open it up you will be able to see what I talking about — neuron, Nov 20 '21 at 03:59
It sounds like the data format may take some reshaping once you load it. `https://nacnudus.github.io/unpivotr/` offers some tools to help wrangle spreadsheets made for human viewing into more analysis-friendly form. — Jon Spring, Nov 20 '21 at 04:25
@neuron I ended up using xslt to restructure the xml to have a sort of column structure (single level of nodes with matching attribute sets) that made it super easy to make into a data frame. I added an answer with my solution with the specifics. — Abigail, Dec 08 '21 at 02:46

score 0 · Answer 1 · answered Nov 20 '21 at 10:29

no output gioven, so here is something to get started with..

goal: extract the voters turnout data by district (the first part of the xml)

library(tidyverse)
library(xml2)

doc <- xml2::read_xml("./detail.xml")

# get the voter turnout-nodes
nodes <- 
# build df
df <- xml2::xml_find_all(doc, ".//VoterTurnout/Precincts/Precinct") %>%
  purrr::map(xml_attrs) %>%
  purrr::map_df(as.list)

# A tibble: 52 x 5
name        totalVoters ballotsCast voterTurnout percentReporting
<chr>       <chr>       <chr>       <chr>        <chr>           
1 Precinct 1  709         185         26.09        4               
2 Precinct 2  932         154         16.52        4               
3 Precinct 3  849         292         34.39        4               
4 Precinct 4  1128        178         15.78        4               
5 Precinct 5  846         165         19.50        4               
6 Precinct 6  1437        188         13.08        4               
7 Precinct 7  1459        165         11.31        4               
8 Precinct 8  558         193         34.59        4               
9 Precinct 9  1320        183         13.86        4               
10 Precinct 10 1292        444         34.37        4 
...

score 0 · Accepted Answer · answered Dec 08 '21 at 02:35

Here's the solution I ended up with: use XSLT to restructure the xml before trying to construct a data frame. Basics of the solution came from R: convert XML data to data frame (coincidently also about election data).

XSLT - Restructured it to just be one long list of every precinct node with the applicable info from their choice, contest, and votetype ancestors as attributes.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/ElectionResult">
    <xsl:copy>
      <xsl:apply-templates select="descendant::Precinct"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Precinct">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:attribute name="election">
        <xsl:value-of select="ancestor::ElectionResult/ElectionName"/>
      </xsl:attribute>
      <xsl:attribute name="contest">
        <xsl:value-of select="ancestor::Contest/@text"/>
      </xsl:attribute>
      <xsl:attribute name="choice">
        <xsl:value-of select="ancestor::Choice/@text"/>
      </xsl:attribute>
      <xsl:attribute name="votetype">
        <xsl:value-of select="ancestor::VoteType[1]/@name"/>
      </xsl:attribute>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

R - The xslt package works as an extension for xml2 to apply the .xsl file.

library(xml2)
library(xslt)
library(tidyverse)

# Parse XML and XSL
xml <- read_xml("electionresults.xml")
style <- read_xml("style.xsl", package = "xslt")

# Transform XML
new_xml <- xslt::xml_xslt(xml, style)

# Build data frame
elections <- new_xml %>% 
  xml_find_all("//Precinct") %>% 
  map_dfr(~list(election = xml_attr(., "election"),
                contest = xml_attr(., "contest"),
                choice = xml_attr(., "choice"),
                votetype = xml_attr(., "votetype"),
                precinct = xml_attr(., "name"),
                votes = xml_attr(., "votes"))) %>% 
  type_convert()

Mapping process for building the data frame came from R XML - combining parent and child nodes into data frame

Import xml to R

2 Answers2