What is the fastest way to convert XML files to data frames in R currently?
The XML looks like this: (Note- not all rows have all fields)
<row>
<ID>001</ID>
<age>50</age>
<field3>blah</field3>
<field4 />
</row>
<row>
<ID>001</ID>
<age>50</age>
<field4 />
</row>
I have tried two approaches:
- The xmlToDataFrame function from the XML library
- The speed oriented xmlToDF function posted here
For an 8.5 MB file, with 1.6k "rows" and 114 "columns", xmlToDataFrame took 25.1 seconds, while xmlToDF took 16.7 seconds on my machine.
These times are quite large, when compared with python XML parsers (eg. xml.etree.ElementTree) which was able to do the job in 0.4 seconds.
Is there a faster way to do this in R, or is there something fundamental in R that prevents us making this faster?
Some light on this would be really helpful!