0

Apologies for the long read in advance, I am trying to keep it as succinct as possible. Feel free to ask me to elaborate on some aspects, or to edit and remove some parts to enhance this question.

Background:
We have an application that can interface with lots of modules (daemon). Each module is configured from XML in a database.

We sometimes have to debug issues when someone has been manually editing the XML configuration (at HQ and on site), so I have taken it upon myself to validate the XML at runtime, so we can catch these configuration issues earlier:

/*---------------------------------------------------------------------------
**
*/
bool BaseApplication::validateXML(const QDomDocument& doc, const QString& schema)
{
    QFile xsdfile(schema);
    if (xsdfile.open(QFile::ReadOnly)) {

        QXmlSchema xsd;
        xsd.load(&xsdfile);

        if (xsd.isValid()) {
            QXmlSchemaValidator validator(xsd);

            const QString xml = doc.toString();
            return validator.validate(xml.toLatin1());
        }
        else {
            qWarning("BaseApplication::validateXML() - schema \"%s\" is not well-formed", qPrintable(schema));
        }
    }
    else {
        qWarning("BaseApplication::validateXML() - unable to open schema \"%s\"", qPrintable(schema));
    }

    return false;
}

All modules have an application class that derives from this BaseApplication. It's part of a larger framework.

So an application at bootstrap could do something like:

void Application::preBegin()
{
    ...
    const QDomDocument config = // load XML from database

    if (!validateXML(config, ":schema.xsd")) {
        qFatal("Application::preBegin() - configuration is not valid");
    }
    ...
}

But doing so remains completely optional (per module). This all works fine in our engineering environment.

The problem:
For {reasons too complicated and distracting} the department that sets up and maintains the larger customer systems have written scripts that generate the various module XML configurations to include (among others) noNamespaceSchemaLocation.

So instead of a typical module configuration looking like this:

<configuration>
  <!--XML struct here-->
</configuration>

They are written like:

<configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" schema_version="1" xsi:noNamespaceSchemaLocation="{url to XSD file on intranet server}">
  <!--XML struct here-->
</configuration>

(modules regard these extra attributes as meta-data)

This causes an issue for modules that have built-in XSD validation. At runtime during the loading and validation of the module's XML configuration we see odd behaviour (UB?). Even though this is a synchronous process in a single thread, the debug log shows events occurring before Application::preBegin() that should be happening afterwards. We also see a full minute (to the second) being lost in XML validation.

Manually removing the noNamespaceSchemaLocation attribute from the root object fixes this issue. Validation then happens in next-to-no-time and events happen in the right sequence according to the log.

The question:
Is there anything I can add to /call in BaseApplication::validateXML() that could cause the routine to ignore any mentions of noNamespaceSchemaLocation in the XML it is trying to validate?

iwarv
  • 335
  • 4
  • 13

1 Answers1

1

It could be that the 1 minute timeout is a tcpip timeout when the validation routine tries to resolve the xml schema location.

what you could do is strip all the namespaces from the config file before validating it. Parse the original config into a xmldocument and let the solution from this question How to remove all namespaces from XML with C#? remove the namespaces from it. The cleaned string result then can be validated against the schema.

martijn
  • 485
  • 2
  • 9
  • I think that is indeed the way to go. For my purpose I probably only need to strip attribute `noNamespaceSchemaLocation` at the start of `BaseApplication::validateXML()`. Thanks. – iwarv Jul 15 '21 at 08:38