0

To be frank, I have an assignment that says, quite vaguely,

"If the file exists, the one-argument constructor allocates memory for the number of records contained in the file and copies them into memory."

Now, in considering this instruction, it would seem I am to allocate the dynamic memory /before/ copy the data over, and this seems in principle, impossible.

To dynamically allocate memory, to my knowledge, you require runtime definition of the size of the block to be reserved.

Given that the file size, or number of 'entries' is unknown, how can one possibly allocate that much memory? Does not the notion defeat the very purpose of dynamic allocation?

Solution wise, it would seem the only option is to parse the entire file, determining the size, allocate the proper amount of memory afterward, and then read through the file again, copying the data into the allocated memory.

Given that this must be a common operation in any program that reads file data, I wonder: What is the proper, or most efficient way of loading a file into RAM?

The notion of reading once to determine the size, and then again to copy seems very inefficient. I assume there is a way to jump to the end of the file to determine max length, which would make the process faster. Or perhaps using a static buffer and loading that in blocks to RAM?

Is it possible to read all of the data, and then move it into dynamic memory using the move operator? Or perhaps more efficient to use a linked list of some kind?

bigcodeszzer
  • 916
  • 1
  • 8
  • 27
  • Possible duplicate of [What is the Fastest Method for High Performance Sequential File I/O in C++?](http://stackoverflow.com/questions/1201261/what-is-the-fastest-method-for-high-performance-sequential-file-i-o-in-c) – erip Feb 02 '16 at 04:38
  • It's difficult to suggest anything useful without seeing the contents of a sample file. – R Sahu Feb 02 '16 at 04:38
  • @RSahu It's just a text file. – bigcodeszzer Feb 02 '16 at 04:40
  • @RSahu Unknown length of characters. – bigcodeszzer Feb 02 '16 at 04:40
  • If the record size is variable and you have to allocate all the memory up front then there's no way to do it other than effectively reading the file contents twice. – Jonathan Potter Feb 02 '16 at 04:41
  • "Now, in considering this instruction, it would seem I am to allocate the dynamic memory /before/ copy the data over, and this seems in principle, impossible." - Not necessarily. There is probably file metadata that would tell you what the size of the file is without you haven't to read the contents of the file first. – Michael Hewson Feb 02 '16 at 04:43
  • @mikeyhew If I know its just a .txt file, is there a way to divide out the number of chars using the file size? That being said, aren't there usually markup code characters, which will lead to an error? Finally, if there is metadata, doesn't that mean that someone must have already parsed the file, and just included the size somewhere? That wouldn't exactly be a solution. – bigcodeszzer Feb 02 '16 at 04:46
  • @JonathanPotter Not even with move semantics? – bigcodeszzer Feb 02 '16 at 04:47
  • "To dynamically allocate memory, to my knowledge, you require runtime definition of the size of the block to be reserved." - not true, actually, the composite is true. – Michael Hewson Feb 02 '16 at 04:47
  • @bigcodeszzer if you are using the C/C++ `char` type, then it's the size of a byte, so the number of `char`s will be the same as the number of bytes in the file. – Michael Hewson Feb 02 '16 at 04:51
  • @bigcodeszzer, see http://stackoverflow.com/questions/6195304/using-fread-to-read-the-contents-of-a-file-into-a-structure. – R Sahu Feb 02 '16 at 04:52
  • @bigcodeszzer but note that if the text encoding is UTF-8 or another variable width encoding, if you want to know the actual number of unicode characters in the file you have to read through the whole contents. – Michael Hewson Feb 02 '16 at 04:53
  • Saying the assignment implies the dynamic allocation must be done at a particular point is, I think, misinterpretation. Anyway - barring any vague definition of size like '# of variable-width text lines' or whatever - assuming raw number of bytes, you can easily open a file without allocating memory for it, use seeking to get the size, then allocate for the result. Simply reading the size of a file doesn't require allocation, even if it must be opened to do so. Even reading text, e.g. a leading record count, probably doesn't need allocation - at least by you (stream objects might do their own) – underscore_d Feb 02 '16 at 14:19

2 Answers2

1

The most efficient method is to have the operating system map the file to memory. Search your OS API for "mmap" or "memory mapping".

Another approach is to seek to the end of the file and get the position (tellg()). This is the size of the file. Allocate an array in dynamic memory or create a std::vector reserving at least this amount of space.

Some Operating Systems have API you can call to get the size of a file (without having to seek to the end). You could use this method, then dynamically allocate the memory or use std::vector<char>.

You will need to come up with a plan if the file doesn't fit into memory.

If you need to read the entire file into memory, you could use istream::read using the file length.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • Your 2nd paragraph is how most people would to this, and certainly how I do it. Simply `tell`ing the size of the file does not require any allocation. – underscore_d Feb 02 '16 at 14:20
0

It all depends on file format. One way to store records is to first write how many records are stored in file. If you have two phone numbers your file might look like this:

2
Jon
555-123
Mary
555-456

In this case the solution is straightforward:

// ...
is >> count;
record_type *record = new record_type[count];
for ( int i = 0; i < count; ++i )
  is >> record[i].name >> record[i].number; // stream checks omitted
// ...

If the file does not store the number of records (I wouldn't do this), you will have to count them first, and then use the above solution:

// ...
int count = 0;
std::string dummy;
while ( is >> dummy >> dummy )
  ++count;
is.clear();
is.seekg( 0 );
// ...

A second solution for the second case, would be to write a dynamic container (I assume you are not allowed to use standard containers) and push the records as you read them:

// ...
list_type list;
record_type r;
while ( is >> r.name >> r.number )
  list.push_back( r );
// ...

The solutions are ordered by complexity. I did not compile the examples above.

zdf
  • 4,382
  • 3
  • 18
  • 29