1

Right now I am working on reading a big .xml file (about 1GB) then extract the information from nodes and assign them to fields of an object of my class.

You can assume that the XML file is contain a huge bunch of information of workers, covering ID, location, gender and so on. The information of each of the workers would be various, which means that one worker would just have ID and location, while another would just have ID and gender, like the following:

<workers>
    <row Id="1" Location="Bos" Gender="M" />
    <row Id="2" Gender="F" />
    <row Id="3" Location="Cal" />
    ....

My silly way is trying to use ifstream then using function getline(), and then extract the information to the string fields of object one by one, then save the object to a container. But it will work under using about 1 GB memory.

I tried to use boost to read XML file before, but when I used the way worker.Gender = child.second.get<string>(<xmlattr>.Gender);, it could not work for each node because some worker did not have the info of gender, that is, this way would return error when there is no info about Gender node of the worker.

So my question would be, how to have a good way to extract the info from this XML file with a low usage of memory? Is it possible to be reduced to 100 MB? And how, please? Why the memory would not be deleted when after function getline() on the next line of text?

Ricky_Lab
  • 21
  • 5
  • 1
    Any attempt to process an XML file that does not involve a proper XML parser (that actually understand XML) will always end in guaranteed tears. Most modern XML libraries even feature an increment SAX parser that incrementally parses XML, to avoid loading the whole thing in memory. – Sam Varshavchik Apr 10 '22 at 00:11
  • To answer the question, I would like to know about the "and so on". What attributes exactly are stored? Only "ID", "location" and "Gender"? Or more? What more? And, is the structure of the XML like shown in your example? Just "Workers" and then lines with the attributes? Or more complex? If it would be simple like the above example then even no XML lib would be needed. Then, the size of the source file does not matter that much. The size of the real content does. Can you give more info? – A M Apr 10 '22 at 08:23
  • Sorry, but specific software recommendations are not an appopriate question for Stackoverflow. – Sam Varshavchik Apr 10 '22 at 12:09
  • @ArminMontigny My xml file would just have the info in this simple format that I give, and they are all consist of "ID", "location", "Gender", "postion" and other attributes. Each line of them would have all the attributes, or some of them. That is. – Ricky_Lab Apr 10 '22 at 22:39
  • 1
    This post suggests using XmlReader for very large files: https://stackoverflow.com/questions/15772031/how-to-parse-very-huge-xml-files-in-c – BillWee Apr 13 '22 at 21:22

0 Answers0