-1

I am trying to make the solution found here xml with nested siblings to data frame in R work where there are repeated nests (I'm not an html person- so not sure if that is the right term)
The code referenced above is returning the first cycle, but I need to import all of the cycles.

The data look like this:

 <Record>
<LastName>REDACTED</LastName>
<FirstName>REDACTED</FirstName>
<DOB>REDACTED</DOB>
<Rapsheet>
  <Header>
    <DateOfBirth>REDACTED</DateOfBirth>
    <SID>REDACTED</SID>
    <Summary>
      <DateOfLastArrest>
        10/01/2012
      </DateOfLastArrest>
      <AgeOfOffender>21</AgeOfOffender>
      <FailuresToAppear>0</FailuresToAppear>
      <ViolationOfCourtOrdersOrConditions>
        0
      </ViolationOfCourtOrdersOrConditions>
      <FelonyArrestsConvictions>
        0/0
      </FelonyArrestsConvictions>
      <MisdemeanorArrestsConvictions>
        0/0
      </MisdemeanorArrestsConvictions>
      <UnknownOffenseLevelArrestsConvictions>
        1/0
      </UnknownOffenseLevelArrestsConvictions>
      <AssaultOnOfficerCharges>
        0
      </AssaultOnOfficerCharges>
      <DeadlyWeaponRelatedCharges>
        0
      </DeadlyWeaponRelatedCharges>
      <EscapeCharges>
        0
      </EscapeCharges>
      <ViolationOfProbationParoleCharges>
        0/0
      </ViolationOfProbationParoleCharges>
    </Summary>
  </Header>
  <Title>VERMONT CRIMINAL HISTORY</Title>
  <Identification>
    <VermontStateID>REDACTED</VermontStateID>
    <DateOfBirth>REDACTED</DateOfBirth>
    <PlaceOfBirthCity></PlaceOfBirthCity>
    <PlaceOfBirthStateOrCountry></PlaceOfBirthStateOrCountry>
    <Sex>F</Sex>
    <Race>W</Race>
    <Ethnicity>
    </Ethnicity>
    <USCitizen></USCitizen>
    <Height>503</Height>
    <Weight>180</Weight>
    <EyeColor>GRN</EyeColor>
    <HairColor>BLN</HairColor>
    <ScarsMarksTattoos>
      <SMTCode>TATTOO</SMTCode>
      <SMTDescription>ARABIC TATOO ON ARM</SMTDescription>
    </ScarsMarksTattoos>
    <ScarsMarksTattoos>
      <SMTCode>TATTOO</SMTCode>
      <SMTDescription>NOSE RING LIP RINGS</SMTDescription>
    </ScarsMarksTattoos>
    <PrintsNCIC></PrintsNCIC>
    <HenryUp></HenryUp>
    <HenryLow></HenryLow>
    <PhotoAvailable></PhotoAvailable>
    <Address>
      <Street>REDACTED</Street>
      <City>WINOOSKI</City>
      <State>VT</State>
      <Zip>05404</Zip>
    </Address>
  </Identification>
  <CriminalHistory>
    <Cycle>
      <CycleNumber>1</CycleNumber>
      <TrackingNumber>1709462</TrackingNumber>
        <Arrest>
          <DateOfArrest>10/01/2012 </DateOfArrest>
          <ArrestAgency>WINOOSKI PD VT0040400</ArrestAgency>
          <ArrestAgencyCaseNumber>12WS04470</ArrestAgencyCaseNumber>
          <Fingerprint>NO</Fingerprint>
          <Charge>
            <ChargeNumber>01</ChargeNumber>
            <ChargeDescription></ChargeDescription>
            <Statute></Statute>
            <Severity></Severity>
          </Charge>
        </Arrest>
        <Arraignment>
          <ArraignmentDate>04/18/2014</ArraignmentDate>
          <ArraignmentAgency>CHITTENDEN CO. DISTRICT COURT</ArraignmentAgency>
          <DocketNumber>REDACTED</DocketNumber>
          <Charge>
            <ChargeNumber>01</ChargeNumber>
            <ChargeDescription>ASSAULT-AGG DOMESTIC-1ST DEG WITH WEAPON</ChargeDescription>
            <Statute>13V1043A2</Statute>
            <Severity>FELONY</Severity>
          </Charge>
        </Arraignment>
        <CourtDisposition>
            <ChargeNumber>01</ChargeNumber>
            <Convicted>NO</Convicted>
            <Felony>NO</Felony>
            <ChargeDescription>ASSAULT-AGG DOMESTIC-1ST DEG WITH WEAPON</ChargeDescription>
            <Statute>13V1043A2</Statute>
            <Disposition>
              06/09/2014 CASE DISMISSED
            </Disposition>
        </CourtDisposition>
    </Cycle>
    <Cycle>
      <CycleNumber>2</CycleNumber>
      <TrackingNumber>1685833</TrackingNumber>
        <Arrest>
          <DateOfArrest>09/30/2012 </DateOfArrest>
          <ArrestAgency>WINOOSKI PD VT0040400</ArrestAgency>
          <ArrestAgencyCaseNumber>12WS004770</ArrestAgencyCaseNumber>
          <Fingerprint>NO</Fingerprint>
        </Arrest>
        <Arraignment>
          <ArraignmentDate>10/01/2012</ArraignmentDate>
          <ArraignmentAgency>CHITTENDEN CO. DISTRICT COURT</ArraignmentAgency>
          <DocketNumber>REDACTED</DocketNumber>
          <Charge>
            <ChargeNumber>01</ChargeNumber>
            <ChargeDescription>ASSAULT-AGG DOMESTIC-1ST DEG WITH WEAPON</ChargeDescription>
            <Statute>13V1043A2</Statute>
            <Severity>FELONY</Severity>
          </Charge>
        </Arraignment>
        <CourtDisposition>
            <ChargeNumber>01</ChargeNumber>
            <Convicted>NO</Convicted>
            <Felony>NO</Felony>
            <ChargeDescription>ASSAULT-AGG DOMESTIC-1ST DEG WITH WEAPON</ChargeDescription>
            <Statute>13V1043A2</Statute>
            <Disposition>
              12/02/2013 CASE DISMISSED
            </Disposition>
        </CourtDisposition>
    </Cycle>
  </CriminalHistory>
</Rapsheet>

Thank you in advance

Community
  • 1
  • 1
  • 2
    What have you tried? Please don't expect people to just write all of your code for you with no attempts. (Because XML is flexible enough to *not* be simply columnar- or row-oriented data, there is no generic solution to the task.) – r2evans Mar 24 '17 at 23:39
  • Before I found the link above- I had tried pretty much everything that poster had tried. I don't understand well enough what the solution is doing to be able to modify it to meet my needs. – Robin Weber Mar 24 '17 at 23:45

1 Answers1

0

Consider multiple calls to xmlToDataframe from XML package where you iterate through the length of Cycle node. Using lapply you can create a list of dataframes and with plyr package's rbind.fill() you can fill non-existent rows prior to row binding which is needed for any missing nodes such as the first Charge in second Cycle.

library(XML)
library(plyr)

doc <- xmlParse("path/To/XML.xml")    
cyclelen <- length(xpathSApply(doc, "//Cycle"))

dfList <- lapply(seq(cyclelen), function(i) {

  identification <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/../../Identification")))
  names(identification) <- paste0("Identification.", names(identification))

  cycle <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]")))
  names(cycle) <- paste0("Cycle.", names(cycle))

  arrest <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/Arrest")))
  names(arrest) <- paste0("Arrest.", names(arrest))

  arraign <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/Arraignment")))
  names(arraign) <- paste0("Arraignment.", names(arraign))

  court <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/CourtDisposition")))
  names(court) <- paste0("CourtDisposition.", names(court))

  cbind(identification, cycle, arrest, arraign, court)  

})

df <- rbind.fill(dfList)

Alternatively, the DRY-er version:

dfList2 <- lapply(seq(cyclelen), function(i) {

  do.call(cbind,
      lapply(c("Identification", "Cycle", "Arrest", "Arraignment", "CourtDisposition"), function(n){ 
        if (n=="Identification") {
          df <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/../../", n)))
        } else if (n=="Cycle") {
          df <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]")))
        } else {
          df <- xmlToDataFrame(doc, nodes = getNodeSet(doc, paste0("//Cycle[",i,"]/", n)))
        }
        names(df) <- paste0(n, ".", names(df))
        return(df)
       })
    )
})

df2 <- rbind.fill(dfList2)

all.equal(df, df2)
# TRUE

Do note: this approach create fields for Cycle.Arrest, Cycle.Arraignment, and Cycle.Disposition with all children node text concatenated but they are separated in individual columns anyway, so simply remove these three.

Output (not all fields shown below)

Dataframe Output

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you for your help. This was useful in getting to the cycles, but I then have to merge the cycles with some of the information in the . Identification Block. People have varying number of cycles, but only one Identification header. There nothing in the cycle block to merge the cycle df to the identification block,and the binds only work for equal rows, which I won't have. – Robin Weber Mar 25 '17 at 12:03
  • sorry- in the Record block- I need to attach the name/dob to the cycles – Robin Weber Mar 25 '17 at 12:42
  • See update. Simply extend to go up two node levels from current *Cycle* for *Identification* element. – Parfait Mar 25 '17 at 14:45
  • Thank you- this has me on the right track for what I need to do. – Robin Weber Mar 26 '17 at 00:21