0

I am quite a newbie with xml. I used XML in R to parse content in xml and put into R objects. I have to deal with nearly 1TB xml data and it took me around 5 hours to parse 2.4 GB data. I know that xmlschema is used to generate xml. I wonder if there is any better method to convert xml to data or another method to use xmlschema to read xml and put values back into raw data other than xmlParse? I now have 5 xmlschema and xml. (I thought it is complex xml)

  • xmlns:nxce="http://tfm.faa.gov/tfms/NasXCoreElements"
  • xmlns:mmd="http://tfm.faa.gov/tfms/MessageMetaData"
  • xmlns:nxcm="http://tfm.faa.gov/tfms/NasXCommonMessages"
  • xmlns:idr="http://tfm.faa.gov/tfms/TFMS_IDRS"
  • xmlns:xis="http://tfm.faa.gov/tfms/TFMS_XIS"
  • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  • xsi:schemaLocation="http://tfm.faa.gov/tfms/TFMS_XIS

sample data: http://www.fly.faa.gov/ASDI/asdidocs/asdi_sample_data.zip I want to extract all flightManagementInfomation data out using SAX

Thanks in advance.

C Doan
  • 91
  • 2
  • 9
  • 5
    Using the [event parse](http://stackoverflow.com/questions/7536754/storing-specific-xml-node-values-with-rs-xmleventparse/7547433#7547433) model might be memory and time efficient; you'll need to provide more detail, but at the same time a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Martin Morgan Aug 18 '13 at 20:23
  • 1
    Schemas use won't improve the performance of XML loading - they tell you something about the expected structure of the parsed XML, but have nothing to do with the parsing process itself. – MiMo Aug 19 '13 at 14:58

1 Answers1

0

Schemas use won't improve the performance of XML loading - they tell you something about the expected structure of the parsed XML, but have nothing to do with the parsing process itself.

You need to use a different parser - if one is available in R (as suggested by Martin), or convert the XML data into something that R can handle more easily using some other language

MiMo
  • 11,793
  • 1
  • 33
  • 48