0

I'm on windows, and I'll work only with windows.

I have a question about opening big files (PTX).

On each line, I will have the coordinates of a points X Y Z I {R G B} ({R, G, B} are not forced to be present).

Since my files are huge (sometimes > 100Go), I would like to read them fastly using memory map (I never did that before), or at least read chunck of memory instead of reading it line by line.

My question is : if I read chunck of memory using

ifstream bigFile("mybigfile.dat");
constexpr size_t bufferSize = 1024 * 1024;
unique_ptr<char[]> buffer(new char[bufferSize]);
while (bigFile)
{
    bigFile.read(buffer.get(), bufferSize);
    // process data in buffer
}

for example, is there a way to be sure that my buffer won't stop in the middle of a line?

For example, my files is

x1 y1 z1 i1 r1 g1 b1
x2 y2 z2 i2 r2 g2 b2
x3 y3 z3 i3 r3 g3 b3  
x4 y4 z4 i4 r4 g4 b4
x5 y5 z5 i5 r5 g5 b5

and I want to create a std::vector<Point>. So I read a buffer size of this file, put it in the buffer, and then I take data from buffer to create my points. But how can I be sure that the buffer won't stop at r3?

If the buffer contains x1 y1 z1 i1 r1 g1 b1 x2 y2 z2 i2 r2 g2 b2 x3 y3 z3 i3 r3 I can't create a point using only x3, y3, z3, i3, r3. I would need g3 and b3 too.

Is there a way to take care of that? I hope that it is understandable, English isn't my native language and I'm not sure I explained it well...

Raph Schim
  • 528
  • 7
  • 30
  • _But how can I be sure that the buffer won't stop at r3?_ You cannot. Parsers usually work with some kind of buffer. When reading next character hits the buffer end, all or half of the buffer is refilled. (Re-filling half of the buffer is useful if "backwards reading" is necessary.) C++ std I/O streams support such buffering as well and you may even change the buffering behavior by their API. I.e. if done correct, the parser will not "notice" when buffer is re-filled. – Scheff's Cat Oct 04 '18 at 10:14
  • This could be implemented by a low-level read (filling the buffer), nested into a high-level read (reading typed contents). High-level read reads characters from low-level read. If low-level read hits end of its internal buffer it tries to re-fill (or returns with error if it fails). "Filling the buffer" could mean literally to map new pages of a file concerning your idea with memory mapped files. (It's a certain time ago, that I worked with memory mapped files by myself. I'm afraid I forgot the details about this.) – Scheff's Cat Oct 04 '18 at 10:27
  • I see ! So if I understand well (once again, i'm not sure about that), the easiest method would be to read all the buffer, then, when it comes to the end, if it's the end of the line, it's ok, if not, you put the pointer at the beggining of the line read,and fill again from there? So a test to see if there is 7 values left in the buffer before creating my point, and if not refill the buffer from here would work? – Raph Schim Oct 04 '18 at 11:44
  • You don't use bigFile.getline() because of performance issues? – undermind Oct 04 '18 at 12:14
  • Hmm. Not sure whether you got it right. Imagine a `class FileMapBuffer` that is responsible to manage the mapped file and mapping into memory (similar like a `std::stream` class). It has a method to read characters. If this read method is called it checks whether buffer end is reached. If so, it just tries to re-fill buffer i.e. to map a new page of mapped file. (Otherwise, it just returns next character in current page/map.) The `getline()` calls this read method until it receives a delimiter. So, refilling of buffer happens in `FileMapBuffer` unnoticed in `getline()`. – Scheff's Cat Oct 04 '18 at 12:19
  • May be, this might be interesting: I recently answered [SO: How to convert QByteArray to std::istream or std::ifstream?](https://stackoverflow.com/a/52492027/7478597). The last sample shows a derived `std::stream` with customized buffering. In that case, the reason was to provide a constant buffer from outside which is neither copied nor changed. A similar solution could provide a memory (page) from a mapped file. The nice about this solution: it can be used like a regular stream e.g. with stream `operator>>()`. – Scheff's Cat Oct 04 '18 at 12:22
  • @undermind I had little perf issues with lighter file before. And here I will need to do better than before, that's why i'm asking ^^ Scheff Thanks a lot, I will take a look at what you sent, I believe it's confusing me for now :/ I'll read it slow and try to understand everything! Thanks! – Raph Schim Oct 04 '18 at 12:27

0 Answers0