1

I'm running into a strange problem which i cannot solve. It seems nobody has this problem. While loading a small XML file (4mb) everything ist fine and the programm runs normal, but when i'm trying to load a larger file (200mb) the programm crashes without any error (even in debug mode). It also does not print out the error messages since the programm crashes before their call. Thanks for helping.

Errorlog from the QT creator is:

The program has unexpectedly finished. C:/Parser [path]/XmlDOM crashed

Code:

QFile file("./file.osm");
qDebug() << file.exists();
qDebug() << file.size();

QString errorStr;
int errorLine;
int errorColumn;

QDomDocument document;

if(!file.open(QIODevice::ReadOnly | QIODevice::Text))
{
    qDebug() << "Failed to open file";
    return -1;
}
else
{
    if(!document.setContent(&file, false, &errorStr, &errorLine, &errorColumn)) //here the programm crashes
    {
        std::cerr << "Error: Parse error at line " << errorLine << ", "
                          << "column " << errorColumn << ": "
                          << qPrintable(errorStr) << std::endl;
       return -1;
    }
    qDebug() << file.isReadable(); //with small files this becomes true 
    file.close();
    } 
QDomElement root = document.firstChildElement();

Solution:

Activate a swap partition or get more RAM, the program chrashes because the PC is/was out of RAM. The updates are left in the question only for the reason to document my further steps.

Update: I installed everything on another machine. Now im getting some output:

Error: Parse error at line 1, column 1: unexpected end of file

Funny thing now even the small files are not working and return the Error. After some digging some people set the path to the file wrong so i checked my path.

qDebug() << "File exists: " << file.exists(); 
qDebug() << "File path: " << QFileInfo(file).absoluteFilePath();
qDebug() << "File size: " << file.size();    

This returns: true, /path/to/file/file.osm , correct size

I also checked if my XML files are valid and they are. So any new suggestions? So far im stuck

Update2: First thank you for your answers! One desperate attempt was:

else{
  document.setContent(&file); //passed and worked, funny
}
qDebug() << file.isReadable(); 
file.close();

This finally works with the larger and smaller files on the new setup:

else
{

    if(!document.setContent(&file))
    {
       std::cerr << "Error: Open file "<< std::endl;
       return -1;
    }
    qDebug() << file.isReadable(); 
    file.close();

Why i used a DOM Parser: The XML Structure is like this:

<osm>
    <node id ="1" lat="value", lon="value" />
    <node id ="2" lat="value", lon="value" />
    <node id ="3" lat="value", lon="value" />

    <way id="12345">
      <nd ref ="1"/>
      <nd ref ="2"/>
      <nd ref ="3"/>
    </way>
</osm>

I want to rebuild the way points for that the lat,lon values from the node´s are necessary. For this i want to be able to match the way ref id with the node id and get the values into the way. Is the Sax parser the better solution for this? I tought with the DOM tree i could easy go through the "node"s and match the id´s without parsing the complete XML again. I´m using Ubuntu and with the new setup qt5. I got a i5 2nd generation and 8GB of RAM which are full when the big file is being processed. One run in release mode needs 50 minutes für 1000ways with 5-50 nodes.

Xref_failed
  • 43
  • 10
  • What is written by debugger during that crash? Stack trace? – Orest Hera Nov 06 '15 at 01:59
  • @Orest Hera This is the only thing i get out of the debugger: Debugging starts &"warning: GDB: Failed to set controlling terminal: Inappropriate ioctl for device\n" Debugging has finished – Xref_failed Nov 06 '15 at 14:52
  • even nothing with: `try{ !document.setContent(&file); } catch (const std::exception& e){ std::cerr << "exception: " << e.what() << std::endl; } ` – Xref_failed Nov 06 '15 at 15:31
  • _"now even the small files are not working"_ : it must be some other error with small files on another machine. Maybe there is some wrong symbol in the edited small XML. – Orest Hera Nov 07 '15 at 12:29
  • I'm glad that you have some progress, but now there is some mess with different issues. At first you worked with Qt4 and Windows and RAM limit was the issue. Now you have Qt5 and 64-bit Ubuntu, so it can work. It is not clear about the difference in "update 2" code. It should work in the same way as the initial code. – Orest Hera Nov 07 '15 at 20:13
  • 1
    Another issue with performance. I do not understand which approach takes 50 minutes (`QDomDocument` or some SAX). The task can be greatly optimized. It is needed at most 2 passes to replace all `way/nd ref` by `lat,lon`. In one pass it is possible to create vector of `lat,lon` pairs. If `value` is represented as `float` it takes 8 bytes per node, so 8Mb per 1 million of nodes. It can be kept in RAM (even in char array representation for `value` to avoid `float` issues). With such dictionary `way ref` ids can be matched in one pass. It should take several seconds. – Orest Hera Nov 07 '15 at 20:19

2 Answers2

1

The package Qt XML and QDomDocument is not supposed to be used with very large XML documents.

The object QDomDocument keeps entire XML document structure in RAM. It is crucial threshold 200MB for XML file size, since with such files QDomDocument can use 2GB or RAM.

Available Qt4 releases for Windows are built with 32 bit compilers. So, it is limit for such applications, for example How much memory can a 32 bit process access on a 64 bit operating system?

In general entire large XML documents should not be loaded to RAM. Such documents should be handled by stream parsers.

On the other hand if the XML document is not much larger than 200MB and the project is already working with QDomDocument and there is enough RAM on the PC (8GB - 16GB) it is possible to compile the project using 64 bit compilers. In that case Qt4 should be manually compiled. Also Release build may use two times less RAM than Debug.

Community
  • 1
  • 1
Orest Hera
  • 6,706
  • 2
  • 21
  • 35
0

The solution is activate a swap partition or get more RAM, the program chrashed because the PC is/was out of RAM. A even better solution is to use a SAX parser.

To the runtime Problem: use maps instead of vectors. The runtime scaled down to ~20 secs for a big file.

Xref_failed
  • 43
  • 10