1

In my program, I may need to load a large file, but not always. So I have defined:

char** largefilecontents;
string fileName="large.txt";

When I need to load the file, the program calles this function:

bool isitok=LoadLargeFile(fileName,largefilecontents);

And the function is:

bool LoadLargeFile(string &filename, char ** &lines)
{
    if (lines) delete [] lines;
    ifstream largeFile;
    #ifdef LINUX
    largeFile.open(filename.c_str());
    #endif
    #ifdef WINDOWS
    largeFile.open(filename.c_str(),ios::binary);
    #endif      
    if (!largeFile.is_open()) return false;
    lines=new char *[10000];
    if (!lines) return false;
    largeFile.clear();
    largeFile.seekg(ios::beg);
    for (int i=0; i>-1; i++)
    {
        string line="";
        getline(largeFile,line);
        if (largeFile.tellg()==-1) break; //when end of file is reached, tellg returns -1
        lines[i]=new char[line.length()];
        lines[i]=const_cast<char*>(line.c_str());
        cout << lines[i] << endl; //debug output
    }

    return true;
}

When I view the debug output of this function, "cout << lines[i] << endl;", it is fine. But when I then check this in the main program like this, it is all messed up:

for (i=0; i<10000; i++)
  cout << largefilecontents[i] << endl;

So within the function LoadLargeFile(), the results are fine, but without LoadLargeFile(), the results are all messed up. My guess is that the char ** &lines part of the function isn't right, but I do not know what this should be.

Could someone help me? Thank you in advance!

gsamaras
  • 71,951
  • 46
  • 188
  • 305
L.A. Rabida
  • 416
  • 3
  • 15
  • you should delete each line first before you delete the array of pointers otherwise you end up with memory leaks. – AndersK May 27 '14 at 21:01

4 Answers4

1

Check what passing by value and what passing by reference is. The problem is that the changes are not preserved after the termination of the function.

However, I feel that the problem starts from how you declare your 2D char array. I think you do that statically, were you could do it easily dynamically, like this:

// We return the pointer
char **get(int N, int M) /* Allocate the array */
{
    /* Normally, you should check if allocation was ok. */
    char** ary = new char*[N];
    for(int i = 0; i < N; ++i)
        ary[i] = new char[M];
    return ary;
}

void delete2Darray(char** p, int N) {
    int i;
    for(i = 0 ; i < N ; i++)
        delete [] p[i];
    delete p;
}

void foo(char** p)
{
  printf("%s\n", p[0]);
  strcpy(p[0], "sam");
  printf("%s\n", p[0]);
}

int main()
{
    char** A = get(1, 5);
    strcpy(A[0], "dad");
    foo(A);
    printf("%s\n", A[0]);
    delete2Darray(A, 1);
    return 0;
}

which gives the output:

dad
sam
sam

Here is a simple example in C, about this. (this is my pseudo-site).

gsamaras
  • 71,951
  • 46
  • 188
  • 305
1

The problem is where you are using line.c_str. You are first allocating the memory for your new string using lines[i]=new char[line.length()]; but then you overwrite the pointer in line[i] with a pointer referring to the c_str within the local variable line . This c_str will get destroyed when line goes out of scope. What you need to do is use strcpy instead of doing simple pointer assignment.

Try the following

lines[i]=new char[line.length()+1]; //Thanks @Claptrap for the catch
strncpy(lines[i], line.c_str(), line.length()+1); //Note, strcpy is not considered safe.
                                                  //Use strncpy

If you want to stick to strictly c++ ways then I'd recommend using @christian-hackl 's suggestion of making

std::vector<std::string> largefilecontents; //Since in your case 10000 is fixed
...
    bool LoadLargeFile(string &filename, std::vector<std::string> &lines)
    {
    ...
    //lines=new char *[10000];
    //if (!lines) return false;
    lines.resize(10000); //Since in your case 10000 is fixed
    ...
    for...
        if(i >= lines.size()) break; //safety net in case more than 10000 lines
        lines[i]="";
        getline(largeFile,lines[i]);
        if (largeFile.tellg()==-1) break; //when end of file is reached, tellg returns -1
        //lines[i]=new char[line.length()];
        //lines[i]=const_cast<char*>(line.c_str());

A Side Note: Arrays v/s Vectors

Arrays are raw C++ datatypes while Vectors are classes, part of standard template library. This means that in theory vectors are slower than arrays. But in practice that drop in speed is negligible. Vectors provide two huge benefits for negligible cost

  • Memory management. You don't have to delete the memory, as soon as the vector object goes out of scope, the memory allocated will be deleted. In your example, you don't seem to necessarily need it, because your array is never to be destroyed in the trivial case
  • Flexibility. With arrays, you have to know the expected size upfront. While vectors expand as needed. Again you don't necessarily need it because as you said the vector size is fixed.

There are two costs associated with vectors

  • Performance: Vectors can be slightly slower than operating on an array, but since most (if not all) of the operations are described as inline functions, compiler gets to optimize the heck out of it, and make it more or less as efficient as arrays. This is especially true for fixed size vectors.
  • Memory: Vector is a container which use arrays internally, this means that Vector object will have a slight memory overhead. But compare that with the savings you could potentially be doing, but not allocating a huge array, and instead gradually expanding the vector. The memory overhead of vector can be a concern only if you are creating lots of small vectors. In your case this overhead is insignificant.

References:

  1. array vs vector vs list
  2. C++ STL vector vs array in the real world
  3. http://www.cplusplus.com/forum/beginner/6321/
  4. http://www.cprogramming.com/tutorial/stl/vector.html
Community
  • 1
  • 1
Spundun
  • 3,936
  • 2
  • 23
  • 36
  • Thanks for writings that all out for me. But regarding vectors of strings, isn't that slower than an array of chars? – L.A. Rabida May 28 '14 at 12:01
  • Added a discussion on Arrays vs Vectors. – Spundun May 28 '14 at 16:42
  • I also rewrote the c++ code to fix bugs, avoid unnecessary copies and set the vector size to 10000 upfront since in your case it's fixed number of lines. – Spundun May 28 '14 at 18:15
1

The lines

   lines[i]=new char[line.length()];
   lines[i]=const_cast<char*>(line.c_str());

do not do what you expect them to

lines[i] = new char [line.length()] allocate a number of bytes for the string but does not include the ending \0 for the string.

the lines[i]=const_cast<char*>(line.c_str()); is setting it to point a local string variable which last as long as one iteration. What you need to do is to allocate a buffer on the heap like you did for your pointers and then copy to it

lines[i] = new char[line.length() + 1];
strcpy(lines[i], line.c_str() );

if this is an exercise from school then i will not say more but if it is a real program then you should instead use a vector<string> to keep things simple instead of fiddling with raw arrays.

one other thing is that it is better if you initialize all your pointers to NULL before reading in lines so that the caller of the function knows when to stop processing the pointers.

that way you can write

for (int j = 0; lines[j] != NULL; ++j)
  cout << lines[j] << endl;
AndersK
  • 35,813
  • 6
  • 60
  • 86
  • To be honest I didn't know about vectors, but I did make a array of chars on purpose in stead of an array of strings, because I know that an array of strings would operate much slower. But I wonder, is that true of a vector of strings as well? – L.A. Rabida May 28 '14 at 12:03
  • a vector of strings would not add much overhead since they would be moved, not copied when passed to the caller. try to avoid c-like constructs and embrace c++ when you can :) – AndersK May 28 '14 at 18:21
0

This piece of code has many deficiencies, so it is very hard to reason about it. For example:

if (!lines) return false;

is useless, because new will (in any normal environment) not return null but throw an exception if memory runs out.

The next thing is, why the magic number 10000? What happens if your file contains more than 10000 lines? Look at your loop:

for (int i=0; i>-1; i++)

It just increments i until it has the maximum allowed value for an int, at which point the incrementation yields undefined behaviour (e.g. seemingly random crashes).

Next, this line:

getline(largeFile,line);

You do not check the stream for errors after the operation but just assume that the reading worked. That's not good practice.

Finally, this line:

lines[i]=const_cast<char*>(line.c_str());

Casting away the costness of a char const * yields undefined behaviour as well.

Have you even tried just storing the contents of the "large file" in a std::vector<std::string>? Chances are that your assumption that it's too big for a normal string vector and that you thus have to use pointers is simply wrong.

Christian Hackl
  • 27,051
  • 3
  • 32
  • 62
  • The magic number 10000 is because the file size is always (without exception) limited to a fixed number. You say "You do not check the stream for errors after the operation but just assume that the reading worked. That's not good practice." How can one check the stream for errors? Thanks. – L.A. Rabida May 28 '14 at 11:43
  • @L.A.Rabida: It depends, to put it bluntly, on how important your program is. For example, is it just a personal tool to handle some personal files on your own computer or does it run on a server where critical data is accessed by others? In the latter case, assuming that all input files are always correct is disastrous, whereas in the former case, you probably just don't care when bad input crashes the software. Error checking in C++ streams is best accomplished through their overloaded operators, resulting in the idiomatic `if (!largeFile) { /* error */ }` after operations which may fail. – Christian Hackl May 28 '14 at 15:27