I usually extract this data into a csv file. I can't do that this time because the text in the notes column is too long and gets truncated if I export to csv. If I export to xml then I can get the whole text I think. But I'm struggling to work with the xml output.
I'm using xml2 with RStudio. I tried xml_children(notes)
and a few other things, like as_list(xml_children(notes))
.
The XML looks like this:
<RESULTS>
<ROW>
<COLUMN NAME="SUBJECT_ID"><![CDATA[12345678]]></COLUMN>
<COLUMN NAME="PAT_ID"><![CDATA[12345678]]></COLUMN>
<COLUMN NAME="PAT_MRN_ID"><![CDATA[12345678]]></COLUMN>
<COLUMN NAME="PAT_ENC_CSN_ID"><![CDATA[222111333]]></COLUMN>
<COLUMN NAME="CREATE_INSTANT_DTTM"><![CDATA[18-JUL-01]]></COLUMN>
<COLUMN NAME="NAME"><![CDATA[Progress Notes]]></COLUMN>
<COLUMN NAME="NOTE_ID"><![CDATA[123456]]></COLUMN>
<COLUMN NAME="LINE"><![CDATA[1]]></COLUMN>
<COLUMN NAME="NOTE_TEXT"><![CDATA[ text text text]]></COLUMN>
</ROW>
I want a data frame that looks like notes$SUBJECT_ID, notes$PAT_ID... notes$NOTE_TEXT
but how?