3

Any help would be appreciated, even if it's just a quick idea.

No librarys(besides stl) or external parsers

I am supposed to create a c++ program that will read the data of an XML file and store it in memory but I am having lots of trouble finding a way to do this. I was hoping that I could get some guidance from someone here. Also regex should be used to recognize the file data or split it up.

Tag names do NOT need to be preserved although it would be ideal, just the nesting data, all the data is stored as text (string)

This is an example showing you what I mean by using stacks and queues. However, It would need to be non-specific to this.

<House>                 // tag: push <House> on stack
  <Info>                // tag: push <Coordinates> on stack
    <Code>ABE</Code>    // element:  push_back on element queue
    <City>Allentown</City>   // element:  push_back on element queue
    <ID>PA</ID>         // element:  puch_back on element queue
  </Info>               // terminator:  pop stack and complete node in queue
  <Exact>               // tag:  push <Exact> on stack
    <X>40.65</X>        // element:  push_back on element queue
    <Y>75.43</Y>        // element:  push_back on element queue
  </Exact>              // terminator:  pop stack and complete node in queue
</House>                // terminator:  pop stack and complete node in queue

So far, it's pretty lame but I have just been able to set up the file to be read line by line and skip the header by detecting it with regex like this:

string fileline;
regex header("[<][?](.*?)[?][>]");
while (getline(ifstreamobj, fileline))
{
    if (regex_match(fileline, header))
    {
        cout<<"Skipping what appears to be a header"<<endl;
    }

    //?
}
cout << "END OF FILE, EOF" << endl;

I don't know really what to do. I guess the stack would be a stack of strings where the tag name would be pushed/popped

And then the queue would be for the actual data in between the tags

Kyle A
  • 928
  • 7
  • 17
fman
  • 230
  • 2
  • 10
  • 1
    I suggest you use some of the existing XML parsers. [This SO answer](http://stackoverflow.com/a/9387612/1593881) seems to have a nice collection. –  Apr 29 '16 at 01:56
  • 1
    Real xml parsing is no joke, do you only need to parse a subset of xml? Also your xml is malformed `40.65`. – user657267 Apr 29 '16 at 02:12
  • 1
    Whoops, Fixed now. I need to parse the whole xml file like shown but I would not go deeper than maybe 3 nested – fman Apr 29 '16 at 02:28
  • @Raw N, fman said no external parsers. Either this is an assignment and fman didn't tell us (which fman should if that is the case), or fman is working for someone that won't allow external parsers for some reason (which should have been stated), or fman is looking for a challenge and is a glutton for punishment. – Kyle A Apr 29 '16 at 02:36
  • @KyleA It's one part of a whole project involving doing statistics on data, however, I need to parse an xml file to get that data. – fman Apr 29 '16 at 02:50

1 Answers1

1

Assuming that you mean non-standard libraries when you say "no libraries", otherwise this becomes a very, very difficult task.

I would use a tree. That way in your example you would have a House node with two child nodes, Info and Exact. The Info node would have three child nodes, Code, City, and ID that would each contain a data node with the data. The Exact node would have two child nodes, X and Y, which would both contain data nodes. That's the most straightforward way I see to store this type of data.

Edit: For the regex part I would try to find the matching tags and then recurse into the contents, something like "\<([^/>]+)\>((?!\</\1\>)*)\</\1\>", which would match an opening tag, capture the contents that don't look like a closing tag, and then match the closing tag. (I may be using a different syntax than your tools use, sorry.) But, this type of match only works if the same tag name cannot be used in the contents.

This pattern match with the following input:

<House><Mouse><House></House></Mouse></House>

would capture the tag name House and the contents <Mouse><House>, which is not what you wanted. Avoiding that false match is non-trivial.

Kyle A
  • 928
  • 7
  • 17
  • 1
    Thank you. I'm going to try making some regex keys. But in terms of false matches, It is safe to assume all the data between the tags will just be letters – fman Apr 29 '16 at 02:52