0

i want to be able to read in a file called doc.txt of arbitrary length and not get a seg fault, i know i have to allocate on the heap but am having trouble doing so. i want to pretend like i have no way of knowing the file size or obtaining the file size, just read how ever much there is and allocate to heap, i want the only limitation to be size of physical memory ur machine has.

#include <iostream> 
#include <string>   
#include <fstream>  

using namespace std ;

int main() {

    char *file_name = "doc.txt" ; 

    ifstream fin ;
    fin.open( file_name ) ;

    if( ! fin ) {
         cout << "Problems opening " << file_name << endl ;
         return -1 ;
    }

    const unsigned MAX = 100 ; 
    string doc[MAX] ;

    unsigned word_count = 0 ;
    //while( fin >> doc[ word_count++ ] ) ;

    while( fin >> doc[ word_count ] ) {
         cout << doc[ word_count ] << endl ;
         word_count ++ ;
    }
    fin.close() ;

    return 0 ;
}   
trincot
  • 317,000
  • 35
  • 244
  • 286
ourmando
  • 1
  • 1

4 Answers4

2

Well if I understand what you are doing here, you are reading in to an array called doc. However since arrays must be allocated statically you are not meeting your goal of having this be more dynamic. There are ways to do this, for example use a vector, like so

vector<string> doc;
string newestInput;
while( fin >> newestInput ) {
     cout << newestInput << endl ;
     doc.push_back(newestInput);
}
daniel gratzer
  • 52,833
  • 11
  • 94
  • 134
1

Not allocating all the space in advance makes sense only if you don't intend to read the whole file. If that really is the case, you have couple of options:

  • Read the file in chunks (either of predetermined length, or lines) and store them in std::list<std::string>. You could also use std::vector<std::string>, but this re-allocates memory as it grows which may lead to out-of-memory earlier than strictly necessary (due to fragmentation and the need to keep both old and new block of memory "alive" at the same time while elements are being copied).
  • Under Windows, use VirtualAlloc to reserve the amount of memory equal to the file size and then commit it page-by-page as you read the file. This way, you'll never commit more than you actually need.
  • Under Windows, use memory-mapped file.

BTW, you can get the file size in a (mostly) portable way using _stat function (read the st_size field).

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • +1 for raising some good aspects. Re list vs vector - list may also waste memory as each node's allocated from heap, which entails management overheads and often rounding up of size (perhaps to the next power of 2). So, not as clear cut which one would fail first as it might appear. Separately, most OSes - including UNIX/Linux - do also have memory mapped files, and for ways to get file size: http://stackoverflow.com/questions/5840148/how-can-i-get-a-files-size-in-c – Tony Delroy Feb 23 '12 at 04:41
0

Your best bet is probably to use an STL container like a vector. These allow you to dynamically allocate memory as you need it.

vector<string> doc;
string word;

while ( !fin.eof() )
{
   fin >> word;
   doc.push_back(word);
   cout << word << end;
}
Matt Phillips
  • 9,465
  • 8
  • 44
  • 75
  • 2
    Matt has the right idea here. A good rule of thumb when writing C++ is to be as lazy as possible. Every time you have a question about the architecture of your code, your first thought should be to look at the STL (and to a lesser extent, Boost) to see if somebody else has already solved the problem. – pg1989 Feb 23 '12 at 04:00
  • 2
    This method of testing for `eof` is a bad practice. [Please read this.](http://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong) – Blastfurnace Feb 23 '12 at 04:23
  • i want to use a linked list and assume that i dont know the size of the size – ourmando Feb 23 '12 at 06:08
  • @Blastfurnace Ok, but what's an actual example of this not working? In practice the last word in the text will get read in by `fin`, the `eofbit` will get set right then, and everything will be fine. – Matt Phillips Feb 23 '12 at 14:51
  • 1
    The `eofbit` is only set after a __failed__ read, not when the last word is read. This means your code will call `push_back` an extra time. Note in [this example code](http://ideone.com/uH9Gc) the vector size is 5 when there are only 4 words in the input. – Blastfurnace Feb 23 '12 at 15:02
  • @Blastfurnace `std::cin` never found an eof, but modifying your code to have it read from a text file ("a b c d") shows that you are right. Thanks-- – Matt Phillips Feb 23 '12 at 16:51
0

You could use a std::vector and store the words/lines in there, that way you don't have to know the size.

There's this good reference article on file I/O here. When reading the text file, they read line by line in a while( myFile.good() ) loop and get the current line using getline(myFile, line) where myFile is your ifstream object and line is a string you want to store the current line in.

In the example they just output the current line, but you could append it to a vector if you want as well.

And, just out of curiosity, what do you use the >> operator for? Couldn't the same be achieved with getline()?

rcplusplus
  • 2,767
  • 5
  • 29
  • 43