2

I want to read a big data, and then write it to a new file using Qt.

I have tried to read a big file. And the big file only have one line. I test with readAll() and readLine().

If the data file is about 600MB, my code can run although it is slow.

If the data file is about 6GB, my code will fail.

Can you give me some suggestions?

Update
My test code is as following:

#include <QApplication>
#include <QFile>
#include <QTextStream>
#include <QTime>
#include <QDebug>
#define qcout qDebug()

void testFile07()
{
    QFile inFile("../03_testFile/file/bigdata03.txt");
    if (!inFile.open(QIODevice::ReadOnly | QIODevice::Text))
    {
        qcout << inFile.errorString();
        return ;
    }

    QFile outFile("../bigdata-read-02.txt");
    if (!outFile.open(QIODevice::WriteOnly | QIODevice::Truncate))
        return;

    QTime time1, time2;
    time1 = QTime::currentTime();
    while(!inFile.atEnd())
    {
        QByteArray arr = inFile.read(3*1024);
        outFile.write(arr);
    }
    time2 = QTime::currentTime();
    qcout << time1.msecsTo(time2);
}

void testFile08()
{
    QFile inFile("../03_testFile/file/bigdata03.txt");
    if (!inFile.open(QIODevice::ReadOnly | QIODevice::Text))
        return;

    QFile outFile("../bigdata-readall-02.txt");
    if (!outFile.open(QIODevice::WriteOnly | QIODevice::Truncate))
        return;

    QTime time1, time2, time3;
    time1 = QTime::currentTime();

    QByteArray arr = inFile.readAll();
    qcout << arr.size();
    time3 = QTime::currentTime();
    outFile.write(inFile.readAll());

    time2 = QTime::currentTime();
    qcout << time1.msecsTo(time2);

}

int main(int argc, char *argv[])
{
    testFile07();
    testFile08();

    return 0;
}

After my test, I share my experience about it.

  • read() and readAll() seem to be the same fast; more actually, read() is slightly faster.
  • The true difference is writing.

The size of file is 600MB:
Using read function, read and write the file cost about 2.1s, with 875ms for reading
Using readAll function, read and write the file cost about 10s, with 907ms for reading

The size of file is 6GB:
Using read function, read and write the file cost about 162s, with 58s for reading
Using readAll function, get the wrong answer 0. Fail to run well.

Cœur
  • 37,241
  • 25
  • 195
  • 267
JosanSun
  • 339
  • 2
  • 12
  • 2
    Please post a [mcve]. – Jesper Juhl Apr 05 '18 at 13:00
  • 1
    I do this in `Qt` with files that are 10 times as large as the sizes you are mentioning. With that said this is from a local disk. I did in the past have problems with reading similar sized files (a few hundred MB) over a samba network share on a windows client. I that case I had to break the read up in smaller chunks. – drescherjm Apr 05 '18 at 13:14
  • Don't use `readAll()`. That's just a quick hack for programmers to lazy to think how to process their input as a stream. If you really, really need random access to the whole file, consider memory-mapping it (you can use `boost::interprocess` if native `mmap()` is insufficiently portable). – Toby Speight Apr 05 '18 at 15:09
  • @JesperJuhl Buddy, I have edited my post. Is it ok now? – JosanSun Apr 05 '18 at 18:50
  • You should benchmark with different buffer sizes in the read() call to find out what buffer results in the best performance. – Paul Belanger Apr 25 '18 at 17:21

3 Answers3

7

Open both files as QFiles. In a loop, read a fixed number of bytes, say 4K, into an array from the input file, then write that array into the output file. Continue until you run out of bytes.

However, if you just want to copy a file verbatim, you can use QFile::copy

Paul Belanger
  • 2,354
  • 14
  • 23
2

You can use QFile::map and use the pointer to the mapped memory to write in a single shot to the target file:

void copymappedfile(QString in_filename, QString out_filename)
{
    QFile in_file(in_filename);
    if(in_file.open(QFile::ReadOnly))
    {
        QFile out_file(out_filename);
        if(out_file.open(QFile::WriteOnly))
        {
            const qint64 filesize = in_file.size();
            uchar * mem = in_file.map(0, filesize, QFileDevice::MapPrivateOption);
            out_file.write(reinterpret_cast<const char *>(mem) , filesize);
            in_file.unmap(mem);

            out_file.close();
        }
        in_file.close();
    }
}
p-a-o-l-o
  • 9,807
  • 2
  • 22
  • 35
  • I have test your code. When the size of file is **600MB**, copying the file only costs about **1.6s**. Great! When the size of file is **6GB**, the code fails. – JosanSun Apr 07 '18 at 06:01
1

One thing to keep in mind: With read() you specify a maximum size for the currently read chunk (in your example 3*1024 bytes), with readAll() you tell the program to read the entire file at once.

In the first case you (repeatedly) put 3072 Bytes on the stack, write them and they get removed from the stack once the current loop iteration ends. In the second case you push the entire file on the stack. Pushing 600MB on the stack at once might be the reason for your performance issues. If you try to put 6GB on the stack at once you may just run out of memory/adress space - causing your program to crash.

CharonX
  • 2,130
  • 11
  • 33