I have been trying to get an XML file into a dataframe but am struggling, I have tried a few approaches and this is where I am at.
My XML file looks like 20k segments of this:
<?xml version="1.0"?>
<data experimentId="5244" savingTime="2018-01-06T14:25:48-0500" eventType="Workflow" userId="303">
<root>
<set id="ASSAY_WORKFLOW">
<row state="MODIFIED" pk="5905_Standard_Validation_Standard_Validation">
<field name="ASSAY_ID">5244</field>
<field name="WORKFLOW_ID">5905_Standard_Validation_Standard_Validation</field>
<field name="WORKFLOW_STATE">0</field>
<field name="ASSAY_WORKFLOW_STATE">InDelegation</field>
<field name="WORKFLOW_LAST_STEP_ID">17896</field>
</row>
</set>
<set id="WORKFLOW_STEPS">
<row state="NEW" pk="17896">
<field name="STEP_ID">17896</field>
<field name="WORKFLOW_ID">5905_Standard_Validation_Standard_Validation</field>
<field name="STEP_DATE">2018-01-06T14:25:45-0500</field>
<field name="STEP_DATE_TZ">America/New_York</field>
<field name="USER_ID">303</field>
<field name="USER_FULL_NAME">Ron Swanson</field>
<field name="NEW_WORKFLOW_ASSAY_STATE">InDelegation</field>
<field name="FORMER_WORKFLOW_ASSAY_STATE">Draft</field>
<field name="ROLE_ID">1</field>
</row>
</set>
<set id="WORKFLOW_STEP_VARIABLES">
<row state="NEW" pk="17896¤nextActorId">
<field name="STEP_ID">17896</field>
<field name="VARIABLE_ID">nextActorId</field>
<field name="VALUE">2</field>
</row>
<row state="NEW" pk="17896¤validateToPendingValidation">
<field name="STEP_ID">17896</field>
<field name="VARIABLE_ID">validateToPendingValidation</field>
<field name="VALUE">false</field>
</row>
<row state="NEW" pk="17896¤signToPendingSignature">
<field name="STEP_ID">17896</field>
<field name="VARIABLE_ID">signToPendingSignature</field>
<field name="VALUE">false</field>
</row>
<row state="NEW" pk="17896¤comment">
<field name="STEP_ID">17896</field>
<field name="VARIABLE_ID">comment</field>
<field name="VALUE">GH-VAP, IgG1 repeats,</field>
</row>
<row state="NEW" pk="17896¤actionDelegateU">
<field name="STEP_ID">17896</field>
<field name="VARIABLE_ID">actionDelegateU</field>
<field name="VALUE">directDelegateU</field>
</row>
</set>
<set id="WORKFLOW_ROLE_NAMES">
<row state="NEW" pk="1">
<field name="ROLE_ID">1</field>
<field name="LANGUAGE_ID">2</field>
<field name="DESCRIPTION">Author</field>
</row>
</set>
</root>
</data>
For each root node there are child elements that have the same tag "field" with attribute "name". The values of which identify the value and name of the column I want it under in my data frame.
I can get everything out with this:
library(XML)
xmlfilealt <- xmlParse("data/eln_audit_workflow.xml")
username <- xpathSApply(xmlfilealt, "//field[@name='USER_FULL_NAME']", xmlValue)
title <- xpathSApply(xmlfilealt, "//field[@name='VALUE']", xmlValue)
state <- xpathSApply(xmlfilealt, "//field[@name='ASSAY_WORKFLOW_STATE']", xmlValue)
actionDate <- xpathSApply(xmlfilealt, "//field[@name='STEP_DATE']", xmlValue)
actor <- xpathSApply(xmlfilealt, "//field[@name='DESCRIPTION']", xmlValue)
I planned to create a data.frame with them but the vectors are all slighty different lengths which I assume is because there are probably some missing elements in some of the root nodes. Can someone clue me in on how to handle this?
Thanks