2

I've read this and then this questions about how to efficiently read big amount of text (floats in the second question) in C++ exploiting the boost::spirit library.

From what I've seen, the solutions proposed in the questions above read the whole text, while I need to read a portion of the input text (for example from char x to char y).

Can I exploit the library above for this purpose? How could I efficiently do it otherwise?

Community
  • 1
  • 1
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138

1 Answers1

2

You don't even need to map a subsection of the file, because mmap just virtually maps memory blocks. Actual pages are only loaded on demand, so you could map the full 12GiB of a file even if you have only, say, 4GiB of physical RAM (not even requiring swap).

If your file is text-bases, you will want to find the start-of-line from a random location in the file.

An example of something similar is in the second approach here: Using boost::iostreams::mapped_file_source with std::multimap

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1
    With the *enormous* caveat that mapping the entire file is only safe on 64 bit architectures, and some would say that not mapping the exact portion you need when that is a well defined and known subset is sloppy programming. – Niall Douglas Dec 22 '15 at 17:17
  • And some would say that's not an enormous caveat ;) – sehe Dec 22 '15 at 17:56
  • It's safe everywhere. It just might not work everywhere – sehe Dec 22 '15 at 17:57
  • No, it's not safe everywhere. As I know you know sehe, if you claim all the big contiguous chunks of address space on 32 bit leaving just little fragments between DLLs etc, you can get some very weird and unpredictable errors and failures not just in your own code but in the OS code. Essentially running out of address space is as buggy and unpredictable as running out of memory is - most code, including OS code, doesn't cope well with it happening due to a lack of testing/who cares/it's easily avoidable anyway. – Niall Douglas Dec 23 '15 at 08:07
  • @NiallDouglas sinds like a reasonable thing to happen. It's just out of memory really. I wonder what OS would cope so badly. Sounds like a reasonable motive to map sub regions indeed – sehe Dec 23 '15 at 08:12
  • I've experienced the problem on Windows, Linux and QNX :). Unfortunately ENOMEM is not consistently returned and therefore behaviour is not consistent either, I've seen EINVAL, EAGAIN, and a few others. I think OS devs just decided it wasn't worth the debugging effort given 64 bit makes the problem go away permanently. – Niall Douglas Dec 23 '15 at 08:32
  • No problem folks: this is a Xeon phi/64 bit dedicated project. – justHelloWorld Dec 23 '15 at 11:05