I have a complex, multiply nested, XML file that I am trying to extract data from and convert into a data frame, for subsequent plotting and analysis etc. Solutions with either R or Python would be fine, but I've never worked with XML files and I'm struggling to understand how to extract the data I need (I'm reading up on XPath syntax, which is new to me).
I've tried using the R packages XML, xml2, and xmltools, and I've also experimented with Python element trees. Most of the examples I've tried following use much simpler XML files, and I've not figured out how to extend the logic to my own case, and only ended up with nonsensical mess.
The structure of the XML file is:
(1) ------------
├── XMLFILE
├── DATASET
(2) ------------
└── GROUPDATA
└── GROUP
├── METHODDATA
├── SAMPLELISTDATA
├── SAMPLE
├── USERDATA
├── COMPOUND
├── METHOD
├── USERDATA
└── PEAK
└── ISPEAK
├── COMPOUND
├── METHOD
├── USERDATA
└── PEAK
└── ISPEAK
└── SAMPLE
├── USERDATA
├── COMPOUND
├── METHOD
├── USERDATA
└── PEAK
└── ISPEAK
├── COMPOUND
├── METHOD
├── USERDATA
└── PEAK
└── ISPEAK
└── CALIBRATIONDATA
├── COMPOUND
├── RESPONSE
└── CURVE
└── RESPONSEFACTOR
└── COMPOUND
├── RESPONSE
└── CURVE
├── CALIBRATIONCURVE
└── DETERMINATION
I only care about what's in the SAMPLELISTDATA section. Also, I've only shown 2 SAMPLES, and 2 COMPOUNDS in each SAMPLE, however in the real file there are many of both. All of the tags in the tree also have many attributes, which I need to extract data from.
The actual XML is huge, but here's a (somewhat) minimal example:
<QUANDATASET description="" version="1">
<XMLFILE filename="C:\Masslynx Projects\Polyphenols_Dev.PRO\quandata.xml" modifieddate="20 Dec 2021" modifiedtime="15:53:06"/>
<DATASET filename="C:\Masslynx Projects\Polyphenols_Dev.PRO\211220_MAA_Jack.qld" modifieddate="20 Dec 2021" modifiedtime="15:50:10" creationdate="20 Dec 2021" creationtime="14:18:02"/>
<GROUPDATA count="1">
<GROUP id="1" name="MAA_JACK">
<METHODDATA id="1" filename="C:\Masslynx Projects\Polyphenols_Dev.PRO\MethDB\MAA_Jack.mdb" modifieddate="20 Dec 2021" modifiedtime="14:04:55" creationdate="20 Dec 2021" creationtime="14:04:55"/>
<SAMPLELISTDATA filename="C:\Masslynx Projects\Polyphenols_Dev.PRO\SampleDB\MAA_211220.SPL" modifieddate="20 Dec 2021" modifiedtime="09:55:58" count="12">
<SAMPLE id="1" groupid="1" name="MAA_211220_01" createdate="20-Dec-21" createtime="10:00:08" type="Analyte" desc="'Umbilicalis' laver filtrate 7D7" dilutionfac="0.0000000000" extractvolume="0.0000000000" initamount="0.0000000000" injectvolume="2.0000000000" job="MAA_211220" sampleid="" samplenumber="1" stdconc="0.0000000000" stockdilutionfac="0.0000000000" subjecttext="" subjecttime="0.0000000000" userdilutionfac="0.0000000000" vial="1:A,1" inletmethodname="C:\Masslynx Projects\Polyphenols_Dev.PRO\ACQUDB\MAA_Dev_17" msmethodname="C:\Masslynx Projects\Polyphenols_Dev.PRO\ACQUDB\MAAs SIR5.EXP" prerunmethodname="" postrunmethodname="" switchmethodname="" hplcmethodname="" tunemethodname="C:\Masslynx Projects\Histamine_QDA_Dev.PRO\ACQUDB\Default.ipr" fractionlynxname="" instrument="ACQ-QDA#KAD3691" lab="" conditions="" submitter="" task="" user="" reinjections="0" text="'Umbilicalis' laver filtrate 7D7">
<COMPOUND id="1" sampleid="1" groupid="1" name="Palythine" type="" cas="" stdconc="0.0000000000">
<PEAK foundscan="514" foundrt="1.7100000381" foundrrt="0.0000000000" predrt="1.7500000000" predrrt="0.0000000000" area="89222.9220000000" height="1567686.0000000000" response="89222.9220000000" pkflags="MM!" analconc="0.0000000000" empc="0.0000000000" bsanalconc="0.0000000000" conccalc="NaN" modifieddate="20-Dec-21" modifiedtime="14:22:50" modifiedtext="" modifieduser="" peakmass="0.0000000000" startrt="1.6399999857" endrt="1.7532999516" startht="-10476.0000000000" endht="-10476.0000000000" absresponse="89222.9220000000" rrtref="0" quanratio="0.0000000000" quanratiopred="1.0000000000" quanratiowin="0.0000000000" ionratio="0.0000000000" ionratiopred="0.0000000000" ionratiowin="0" ionratioflag="0" chromnoise="11.0944900513" detectionthreshold="0.0000000000" detectionflag="0" quanthreshold="0.0000000000" quanflag="0" snlodflag="0" snloqflag="0" rrf="0.0000000000" chromtrace="318_322" peaks="0" pkwidth="3.0210000000" pksigma="6.3800000000" pkskew="-0.1190000000" pkkurt="-0.4500000000" heightdivarea="17.5704400266" baselinewidth="6.7979979515" peakquality="n/a" peakqualitydesc="" peakqualityref="N" replimflag="0" maxreplimflag="0" recovlimflag="0" matrixblankflag="0" solventblankflag="0" devflag="0" devflagmidconc="0" devflaglowconc="0" qcsignoiseflag="0" qcionratioflag="0" qcrettimeflag="0" qcpeakshapeflag="0" signoise="141303.1146768486" signoiseflag="0" cdflag="0" stddevflag="0" rtflag="0" peakasymmetry="0" peakfrontwidth="0.0700000003" peaktailwidth="0.0430000015" peakasymmetryvalue="0.6190000176" percrecovery="0.0000000000" symflag="" percsym="0.0000000000" belowrl="1" chromnoisehgt="0.0000000000" concdevperc="0.0000000000" lowerbound1="0.0000000000" lowerbound2="0.0000000000" lowerbound3="0.0000000000" lowerbound4="0.0000000000" mediumbound1="0.0000000000" mediumbound2="0.0000000000" mediumbound3="0.0000000000" mediumbound4="0.0000000000" upperbound1="0.0000000000" upperbound2="0.0000000000" upperbound3="0.0000000000" upperbound4="0.0000000000" nosolflag="0" peakmissing="0" peaksinc="0" toxconc1="0.0000000000" toxconc2="0.0000000000" toxconc3="0.0000000000" toxconc4="0.0000000000" toxfactor1="0.0000000000" toxfactor2="0.0000000000" toxfactor3="0.0000000000" toxfactor4="0.0000000000" toxlod1="0.0000000000" toxlod2="0.0000000000" toxlod3="0.0000000000" toxlod4="0.0000000000" toxloq1="0.0000000000" toxloq2="0.0000000000" toxloq3="0.0000000000" toxloq4="0.0000000000" userfactor="1.0000000000" userrf="0.0000000000" picsforward="0" picsreverse="0" iFIT="N/A" iFITnorm="N/A" iFITconfidence="N/A" foundmass="N/A" mDamasserror="N/A" ppmmasserror="N/A" iFitflag="0" iFitnormflag="0" iFitconfflag="0" mDaerrorflag="0" ppmerrorflag="0">
<ISPEAK area="" height="" foundrt="" absresponse=""/>
</PEAK>
<METHOD rref="0.0000000000" predrt="1.7500000000" predrrt="0.0000000000" userfactor="0.0000000000" userrf="0.0000000000" quantrace="318_322" secondarytrace="" useabsmasswin="1" chromasswinabs="1.0000000000" chromasswinppm="10.0000000000" stockconcfactor="0.0000000000" calibref="Palythine" replim="0.0000000000" replimflag="0" maxreplim="0.0000000000" maxreplimflag="0" minrecovlim="0.0000000000" maxrecovlim="100.0000000000" recovlimflag="0" maxstddev="0.0000000000" signoiseflag="0" mincoeffdet="0.5000000000" cdflag="0" minpeakwidth="0.0000000000" peakwidthtol="0.0000000000" peakwidthflag="0" blanklevel="0.0000000000" stddevflag="0" rtupper="0.0000000000" rtlower="0.0000000000" rtflag="0"/>
<USERDATA sampleid="1" groupid="1"/>
</COMPOUND>
<COMPOUND id="14" sampleid="1" groupid="1" name="Porphyra 334 SIR" type="" cas="" stdconc="0.0000000000">
<PEAK foundscan="161" foundrt="3.3292999268" foundrrt="0.0000000000" predrt="3.3099999428" predrrt="1.0000000000" area="2140861.2500000000" height="16134221.0000000000" response="2140861.2500000000" pkflags="bb" analconc="0.0000000000" empc="0.0000000000" bsanalconc="0.0000000000" conccalc="NaN" modifieddate="" modifiedtime="" modifiedtext="" modifieduser="" peakmass="0.0000000000" startrt="3.1303999424" endrt="3.7107000351" startht="3651.8000000000" endht="16670.4000000000" absresponse="2140861.2500000000" rrtref="0" quanratio="0.0000000000" quanratiopred="1.0000000000" quanratiowin="0.0000000000" ionratio="0.0000000000" ionratiopred="0.0000000000" ionratiowin="0" ionratioflag="0" chromnoise="334.2170715332" detectionthreshold="0.0000000000" detectionflag="0" quanthreshold="0.0000000000" quanflag="0" snlodflag="0" snloqflag="0" rrf="0.0000000000" chromtrace="347.1" peaks="0" pkwidth="7.7870000000" pksigma="3.2770000000" pkskew="0.6590000000" pkkurt="1.4860000000" heightdivarea="7.5363225898" baselinewidth="34.8180055618" peakquality="n/a" peakqualitydesc="" peakqualityref="N" replimflag="0" maxreplimflag="0" recovlimflag="0" matrixblankflag="0" solventblankflag="0" devflag="0" devflagmidconc="0" devflaglowconc="0" qcsignoiseflag="0" qcionratioflag="0" qcrettimeflag="0" qcpeakshapeflag="0" signoise="48274.6764729440" signoiseflag="0" cdflag="0" stddevflag="0" rtflag="0" peakasymmetry="0" peakfrontwidth="0.2000000030" peaktailwidth="0.3799999952" peakasymmetryvalue="1.8999999762" percrecovery="0.0000000000" symflag="" percsym="0.0000000000" belowrl="1" chromnoisehgt="6160.2280000000" concdevperc="0.0000000000" lowerbound1="0.0000000000" lowerbound2="0.0000000000" lowerbound3="0.0000000000" lowerbound4="0.0000000000" mediumbound1="0.0000000000" mediumbound2="0.0000000000" mediumbound3="0.0000000000" mediumbound4="0.0000000000" upperbound1="0.0000000000" upperbound2="0.0000000000" upperbound3="0.0000000000" upperbound4="0.0000000000" nosolflag="0" peakmissing="0" peaksinc="0" toxconc1="0.0000000000" toxconc2="0.0000000000" toxconc3="0.0000000000" toxconc4="0.0000000000" toxfactor1="0.0000000000" toxfactor2="0.0000000000" toxfactor3="0.0000000000" toxfactor4="0.0000000000" toxlod1="0.0000000000" toxlod2="0.0000000000" toxlod3="0.0000000000" toxlod4="0.0000000000" toxloq1="0.0000000000" toxloq2="0.0000000000" toxloq3="0.0000000000" toxloq4="0.0000000000" userfactor="1.0000000000" userrf="0.0000000000" picsforward="0" picsreverse="0" iFIT="N/A" iFITnorm="N/A" iFITconfidence="N/A" foundmass="N/A" mDamasserror="N/A" ppmmasserror="N/A" iFitflag="0" iFitnormflag="0" iFitconfflag="0" mDaerrorflag="0" ppmerrorflag="0">
<ISPEAK area="" height="" foundrt="" absresponse=""/>
</PEAK>
<METHOD rref="0.0000000000" predrt="3.3099999428" predrrt="1.0000000000" userfactor="0.0000000000" userrf="0.0000000000" quantrace="347.1" secondarytrace="" useabsmasswin="1" chromasswinabs="1.0000000000" chromasswinppm="10.0000000000" stockconcfactor="0.0000000000" calibref="Porphyra 334 SIR" replim="0.0000000000" replimflag="0" maxreplim="0.0000000000" maxreplimflag="0" minrecovlim="0.0000000000" maxrecovlim="100.0000000000" recovlimflag="0" maxstddev="0.0000000000" signoiseflag="0" mincoeffdet="0.5000000000" cdflag="0" minpeakwidth="0.0000000000" peakwidthtol="0.0000000000" peakwidthflag="0" blanklevel="0.0000000000" stddevflag="0" rtupper="0.0000000000" rtlower="0.0000000000" rtflag="0"/>
<USERDATA sampleid="1" groupid="1"/>
</COMPOUND>
<USERDATA sampleid="1" groupid="1"/>
</SAMPLE>
<SAMPLE id="2" groupid="1" name="MAA_211220_02" createdate="20-Dec-21" createtime="10:11:04" type="Analyte" desc="'Umbilicalis' laver filtrate 3D9" dilutionfac="0.0000000000" extractvolume="0.0000000000" initamount="0.0000000000" injectvolume="2.0000000000" job="MAA_211220" sampleid="" samplenumber="2" stdconc="0.0000000000" stockdilutionfac="0.0000000000" subjecttext="" subjecttime="0.0000000000" userdilutionfac="0.0000000000" vial="1:A,2" inletmethodname="C:\Masslynx Projects\Polyphenols_Dev.PRO\ACQUDB\MAA_Dev_17" msmethodname="C:\Masslynx Projects\Polyphenols_Dev.PRO\ACQUDB\MAAs SIR5.EXP" prerunmethodname="" postrunmethodname="" switchmethodname="" hplcmethodname="" tunemethodname="C:\Masslynx Projects\Histamine_QDA_Dev.PRO\ACQUDB\Default.ipr" fractionlynxname="" instrument="ACQ-QDA#KAD3691" lab="" conditions="" submitter="" task="" user="" reinjections="0" text="'Umbilicalis' laver filtrate 3D9">
<COMPOUND id="1" sampleid="2" groupid="1" name="Palythine" type="" cas="" stdconc="0.0000000000">
<PEAK foundscan="517" foundrt="1.7200000286" foundrrt="0.0000000000" predrt="1.7500000000" predrrt="0.0000000000" area="69654.0080000000" height="1250121.0000000000" response="69654.0080000000" pkflags="MM!" analconc="0.0000000000" empc="0.0000000000" bsanalconc="0.0000000000" conccalc="NaN" modifieddate="20-Dec-21" modifiedtime="14:24:57" modifiedtext="" modifieduser="" peakmass="0.0000000000" startrt="1.6000000238" endrt="1.7599999905" startht="0.0000000000" endht="10847.0340000000" absresponse="69654.0080000000" rrtref="0" quanratio="0.0000000000" quanratiopred="1.0000000000" quanratiowin="0.0000000000" ionratio="0.0000000000" ionratiopred="0.0000000000" ionratiowin="0" ionratioflag="0" chromnoise="4.1693286896" detectionthreshold="0.0000000000" detectionflag="0" quanthreshold="0.0000000000" quanflag="0" snlodflag="0" snloqflag="0" rrf="0.0000000000" chromtrace="318_322" peaks="0" pkwidth="3.0090000000" pksigma="6.4940000000" pkskew="-0.4530000000" pkkurt="0.7820000000" heightdivarea="17.9475817099" baselinewidth="9.5999979973" peakquality="n/a" peakqualitydesc="" peakqualityref="N" replimflag="0" maxreplimflag="0" recovlimflag="0" matrixblankflag="0" solventblankflag="0" devflag="0" devflagmidconc="0" devflaglowconc="0" qcsignoiseflag="0" qcionratioflag="0" qcrettimeflag="0" qcpeakshapeflag="0" signoise="299837.4781816338" signoiseflag="0" cdflag="0" stddevflag="0" rtflag="0" peakasymmetry="0" peakfrontwidth="0.1199999973" peaktailwidth="0.0399999991" peakasymmetryvalue="0.3330000043" percrecovery="0.0000000000" symflag="" percsym="0.0000000000" belowrl="1" chromnoisehgt="0.0000000000" concdevperc="0.0000000000" lowerbound1="0.0000000000" lowerbound2="0.0000000000" lowerbound3="0.0000000000" lowerbound4="0.0000000000" mediumbound1="0.0000000000" mediumbound2="0.0000000000" mediumbound3="0.0000000000" mediumbound4="0.0000000000" upperbound1="0.0000000000" upperbound2="0.0000000000" upperbound3="0.0000000000" upperbound4="0.0000000000" nosolflag="0" peakmissing="0" peaksinc="0" toxconc1="0.0000000000" toxconc2="0.0000000000" toxconc3="0.0000000000" toxconc4="0.0000000000" toxfactor1="0.0000000000" toxfactor2="0.0000000000" toxfactor3="0.0000000000" toxfactor4="0.0000000000" toxlod1="0.0000000000" toxlod2="0.0000000000" toxlod3="0.0000000000" toxlod4="0.0000000000" toxloq1="0.0000000000" toxloq2="0.0000000000" toxloq3="0.0000000000" toxloq4="0.0000000000" userfactor="1.0000000000" userrf="0.0000000000" picsforward="0" picsreverse="0" iFIT="N/A" iFITnorm="N/A" iFITconfidence="N/A" foundmass="N/A" mDamasserror="N/A" ppmmasserror="N/A" iFitflag="0" iFitnormflag="0" iFitconfflag="0" mDaerrorflag="0" ppmerrorflag="0">
<ISPEAK area="" height="" foundrt="" absresponse=""/>
</PEAK>
<METHOD rref="0.0000000000" predrt="1.7500000000" predrrt="0.0000000000" userfactor="0.0000000000" userrf="0.0000000000" quantrace="318_322" secondarytrace="" useabsmasswin="1" chromasswinabs="1.0000000000" chromasswinppm="10.0000000000" stockconcfactor="0.0000000000" calibref="Palythine" replim="0.0000000000" replimflag="0" maxreplim="0.0000000000" maxreplimflag="0" minrecovlim="0.0000000000" maxrecovlim="100.0000000000" recovlimflag="0" maxstddev="0.0000000000" signoiseflag="0" mincoeffdet="0.5000000000" cdflag="0" minpeakwidth="0.0000000000" peakwidthtol="0.0000000000" peakwidthflag="0" blanklevel="0.0000000000" stddevflag="0" rtupper="0.0000000000" rtlower="0.0000000000" rtflag="0"/>
<USERDATA sampleid="2" groupid="1"/>
</COMPOUND>
<COMPOUND id="14" sampleid="2" groupid="1" name="Porphyra 334 SIR" type="" cas="" stdconc="0.0000000000">
<PEAK foundscan="162" foundrt="3.3459000587" foundrrt="0.0000000000" predrt="3.3099999428" predrrt="1.0000000000" area="1934833.8750000000" height="14881056.0000000000" response="1934833.8750000000" pkflags="bb" analconc="0.0000000000" empc="0.0000000000" bsanalconc="0.0000000000" conccalc="NaN" modifieddate="" modifiedtime="" modifiedtext="" modifieduser="" peakmass="0.0000000000" startrt="3.1800999641" endrt="3.7107000351" startht="5267.0000000000" endht="16324.8000000000" absresponse="1934833.8750000000" rrtref="0" quanratio="0.0000000000" quanratiopred="1.0000000000" quanratiowin="0.0000000000" ionratio="0.0000000000" ionratiopred="0.0000000000" ionratiowin="0" ionratioflag="0" chromnoise="208.7208557129" detectionthreshold="0.0000000000" detectionflag="0" quanthreshold="0.0000000000" quanflag="0" snlodflag="0" snloqflag="0" rrf="0.0000000000" chromtrace="347.1" peaks="0" pkwidth="7.5160000000" pksigma="3.2120000000" pkskew="0.6470000000" pkkurt="1.3920000000" heightdivarea="7.6911285213" baselinewidth="31.8360042572" peakquality="n/a" peakqualitydesc="" peakqualityref="N" replimflag="0" maxreplimflag="0" recovlimflag="0" matrixblankflag="0" solventblankflag="0" devflag="0" devflagmidconc="0" devflaglowconc="0" qcsignoiseflag="0" qcionratioflag="0" qcrettimeflag="0" qcpeakshapeflag="0" signoise="71296.4497446734" signoiseflag="0" cdflag="0" stddevflag="0" rtflag="0" peakasymmetry="0" peakfrontwidth="0.1669999957" peaktailwidth="0.3639999926" peakasymmetryvalue="2.1860001087" percrecovery="0.0000000000" symflag="" percsym="0.0000000000" belowrl="1" chromnoisehgt="5185.1130000000" concdevperc="0.0000000000" lowerbound1="0.0000000000" lowerbound2="0.0000000000" lowerbound3="0.0000000000" lowerbound4="0.0000000000" mediumbound1="0.0000000000" mediumbound2="0.0000000000" mediumbound3="0.0000000000" mediumbound4="0.0000000000" upperbound1="0.0000000000" upperbound2="0.0000000000" upperbound3="0.0000000000" upperbound4="0.0000000000" nosolflag="0" peakmissing="0" peaksinc="0" toxconc1="0.0000000000" toxconc2="0.0000000000" toxconc3="0.0000000000" toxconc4="0.0000000000" toxfactor1="0.0000000000" toxfactor2="0.0000000000" toxfactor3="0.0000000000" toxfactor4="0.0000000000" toxlod1="0.0000000000" toxlod2="0.0000000000" toxlod3="0.0000000000" toxlod4="0.0000000000" toxloq1="0.0000000000" toxloq2="0.0000000000" toxloq3="0.0000000000" toxloq4="0.0000000000" userfactor="1.0000000000" userrf="0.0000000000" picsforward="0" picsreverse="0" iFIT="N/A" iFITnorm="N/A" iFITconfidence="N/A" foundmass="N/A" mDamasserror="N/A" ppmmasserror="N/A" iFitflag="0" iFitnormflag="0" iFitconfflag="0" mDaerrorflag="0" ppmerrorflag="0">
<ISPEAK area="" height="" foundrt="" absresponse=""/>
</PEAK>
<METHOD rref="0.0000000000" predrt="3.3099999428" predrrt="1.0000000000" userfactor="0.0000000000" userrf="0.0000000000" quantrace="347.1" secondarytrace="" useabsmasswin="1" chromasswinabs="1.0000000000" chromasswinppm="10.0000000000" stockconcfactor="0.0000000000" calibref="Porphyra 334 SIR" replim="0.0000000000" replimflag="0" maxreplim="0.0000000000" maxreplimflag="0" minrecovlim="0.0000000000" maxrecovlim="100.0000000000" recovlimflag="0" maxstddev="0.0000000000" signoiseflag="0" mincoeffdet="0.5000000000" cdflag="0" minpeakwidth="0.0000000000" peakwidthtol="0.0000000000" peakwidthflag="0" blanklevel="0.0000000000" stddevflag="0" rtupper="0.0000000000" rtlower="0.0000000000" rtflag="0"/>
<USERDATA sampleid="2" groupid="1"/>
</COMPOUND>
<USERDATA sampleid="2" groupid="1"/>
</SAMPLE>
</SAMPLELISTDATA>
<CALIBRATIONDATA filename="C:\Masslynx Projects\Caffeine.PRO\CurveDB\Meth1.cdb" modifieddate="25 Sep 2015" modifiedtime="00:20:14" count="2">
<COMPOUND id="1" name="Compound A ( 430.5 )">
<RESPONSE type="External Std" ref="" rah="Area"/>
<CURVE type="RF" origin="" weighting="" axistrans="">
<RESPONSEFACTOR cc="15552.5556000000" stddev="2208.2674143620" percrelsd="0.1319874310"/>
</CURVE>
</COMPOUND>
<COMPOUND id="2" name="Compound B ( 458.5 )">
<RESPONSE type="Internal Std" ref="1" rah="Area * ( IS Conc. / IS Area )"/>
<CURVE type="Linear" origin="Exclude" weighting="1/x" axistrans="None">
<CALIBRATIONCURVE curve="0.012594 * x + 0.005516"/>
<DETERMINATION rsquared="0.9741537568"/>
</CURVE>
</COMPOUND>
</CALIBRATIONDATA>
</GROUP>
</GROUPDATA>
</QUANDATASET>
What I'm trying to get to is a single data frame (in either R or Python/Pandas) where each line represents all of the data (attributes) associated with a SAMPLE/COMPOUND pair (i.e. in my example above has 2 samples with 2 compounds each, which should be then 4 rows of the data frame, with many many columns for all of the attributes from all of the node/child/attributes associated with them).
A list of data frames, one for each sample, would also work, but then the sample names would need to be associated to each data frame in that list, so I think the one big data frame might be easier.
Thanks so much for any help/insights/tips/advice.