19

I know how to create a complete dom from an xml file just using XercesDOMParser:

xercesc::XercesDOMParser parser = new xercesc::XercesDOMParser();
parser->parse(path_to_my_file);
parser->getDocument(); // From here on I can access all nodes and do whatever i want

Well, that works... but what if I'd want to parse a string? Something like

std::string myxml = "<root>...</root>";
xercesc::XercesDOMParser parser = new xercesc::XercesDOMParser();
parser->parse(myxml);
parser->getDocument(); // From here on I can access all nodes and do whatever i want

I'm using version 3. Looking inside the AbstractDOMParser I see that parse method and its overloaded versions, only parse files.

How can I parse from a string?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Andry
  • 16,172
  • 27
  • 138
  • 246

4 Answers4

22

Create a MemBufInputSource and parse that:

xercesc::MemBufInputSource myxml_buf(myxml.c_str(), myxml.size(),
                                     "myxml (in memory)");
parser->parse(myxml_buf);
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 4
    It's a "fake system id" that's used in error messages and "any entities which are referred to from this entity via relative paths/URLs will be relative to this fake system id". See API docs. – Fred Foo Jan 14 '11 at 12:43
  • 1
    larsmans could you please tell me why, when using your code and correctly printing the xml, when I call Terminate() my app goes on Segmentation Fault????? – Andry Jan 14 '11 at 13:20
  • @Andry, I can't tell with just this info. Can you try copying the string with `new char[]` and setting the 4th (`adoptBuffer`) ctor argument to `true`? (see http://xerces.apache.org/xerces-c/apiDocs-3/classMemBufInputSource.html#f8f95589003db627e20c763227fc2b9c) – Fred Foo Jan 14 '11 at 13:42
  • 2
    Well I discovered it... see here... absurd ahaha http://xerces.apache.org/xerces-c/faq-parse-2.html#faq-7 – Andry Jan 14 '11 at 13:48
  • @Andry: I know, the Xerces rules for memory allocation are overly complicated. They seem never to have heard of RAII. Too bad. – Fred Foo Jan 14 '11 at 13:54
  • @Andry - sadly, the `absurd ahaha` link has aged into oblivion. :( Appears to have become: http://xerces.apache.org/xerces-c/faq-parse-3.html#faq-7 – Jesse Chisholm Oct 30 '15 at 16:47
  • How did you solve the seg fault? I'm having the same problem. – Mike S Nov 05 '15 at 17:38
  • Don't know if this is why any of above comments were seeing seg faults but MemBufInputSource doesn't seem to work unless you first initialize the system. More detail in answer below. – chars Apr 05 '22 at 23:07
13

Use the following overload of XercesDOMParser::parse():

void XercesDOMParser::parse(const InputSource& source);

passing it a MemBufInputSource:

MemBufInputSource src((const XMLByte*)myxml.c_str(), myxml.length(), "dummy", false);
parser->parse(src);
Daniel Gehriger
  • 7,339
  • 2
  • 34
  • 55
  • 1
    How can I figure out in what namespace MemBufInputSource and Wrapper4InputSource are in? I'm having serious trouble with namespaces in xerces. Ty – Silver Apr 05 '13 at 08:02
  • 4
    It's on `xercesc` namespace, but you also need `#include `. I'm two years late, but I had the same issue and someone else can have it again later. – villasv Nov 07 '15 at 05:02
  • Also +1 for also specifying the necessary cast. – villasv Nov 07 '15 at 05:03
3

Im doing it another way. If this is incorrect, please tell me why. It seems to work. This is what parse expects:

DOMDocument* DOMLSParser::parse(const DOMLSInput * source )

So you need to put in a DOMLSInput instead of a an InputSource:

xercesc::DOMImplementation * impl = xercesc::DOMImplementation::getImplementation();
xercesc::DOMLSParser *parser = (xercesc::DOMImplementationLS*)impl)->createLSParser(xercesc::DOMImplementation::MODE_SYNCHRONOUS, 0);
xercesc::DOMDocument *doc;

xercesc::Wrapper4InputSource source (new xercesc::MemBufInputSource((const XMLByte *) (myxml.c_str()), myxml.size(), "A name");
parser->parse(&source);
Silver
  • 1,075
  • 3
  • 12
  • 37
  • Thanks for hinting this. This answer seems to be closer to the actual [DOM Programming Guide](https://xerces.apache.org/xerces-c/program-dom-3.html) – user23573 Nov 17 '15 at 10:10
  • I tried to use this but it seems to fail, when i tried to replace double quotes to single and add \n\ on each line, parsing seems ok. I did that based on MemParse.cpp sample in xerces. Do you know the problem with that? – steiryx Aug 12 '20 at 06:00
  • Hi, This is an answer from 7 year ago so I have no clue what I was doing at the time. Maybe if you can elaborate I can think with you. Where/what line did you change the quotes and add the \n? – Silver Aug 13 '20 at 11:03
0

You may use MemBufInputSource as found in the xercesc/framework/MemBufInputSource.cpp, and the header file, MemBufInputSource.hpp contains extensive documentation, as similar to answers above:

#include <xercesc/framework/MemBufInputSource.hpp>

char* myXMLBufString = "<root>hello xml</root>";
MemBufInputSource xmlBuf((const XMLByte*)myXMLBufString, 23, "myXMLBufName", false);

But take note, this doesn't seem to work unless you first initialize the system, as below (taken from the xerces-c-3.2.3/samples/src/SAX2Count/SAX2Count.cpp)

bool                         recognizeNEL = false;
char                         localeStr[64];
memset(localeStr, 0, sizeof localeStr);

// Initialize the XML4C2 system
try {
    if (strlen(localeStr)) {
        XMLPlatformUtils::Initialize(localeStr);
    } else {
        XMLPlatformUtils::Initialize();
    }
    if (recognizeNEL) {
        XMLPlatformUtils::recognizeNEL(recognizeNEL);
    }
} catch (const XMLException& toCatch) {
    XERCES_STD_QUALIFIER cerr << "Error during initialization! Message:\n"
            << StrX(toCatch.getMessage()) << XERCES_STD_QUALIFIER endl;
    return 1;
}

Of course reading a file wouldn't require thinking about this type of prep since you just pass a file path to the program which the parser takes. So for those experiencing seg faults, this could be the answer.

chars
  • 343
  • 3
  • 10