0

I've been trying to compress strings and save them to text files, then read the data and decompress it. When I try to decompress the read string, however, I get a Z_BUF_ERROR (-5) and the string may or may not decompress.

In the console, I can compress/decompress all day:

std::string s = zlib_compress("HELLO asdfasdf asdf asdfasd f asd f asd f awefo@8 892y*(@Y");
std::string e = zlib_decompress(s);

The string e will return the original string with no difficulty.

However, when I do this:

zlib_decompress(readFile(filename));

I get a Z_BUF_ERROR. I think it might be due in part to hidden characters in files, but I'm not really sure.

Here's my readFile function:

std::string readFile(std::string filename)
{
    std::ifstream file;
    file.open(filename.c_str(), std::ios::binary);

    file.seekg (0, std::ios::end);
    int length = file.tellg();
    file.seekg (0, std::ios::beg);

    char * buffer = new char[length];

    file.read(buffer, length);
    file.close();

    std::string data(buffer);

    return data;
}

When I write the compressed data, I use:

void writeFile(std::string filename, std::string data)
{
    std::ofstream file;
    file.open(filename.c_str(), std::ios::binary);
    file << data;
    file.close();
}

If needed, I'll show the functions I use to de/compress, but if it works without the File IO, I feel that the problem is an IO problem.

  • What length do the files have you are reading? Severa MB? Or just as the strings you are using? – bash.d Feb 26 '13 at 14:39
  • 1
    Though I don't immediately see why your code doesn't work as expected, I believe I see a memory leak: `buffer` in `readFile()` is never deallocated. – ahans Feb 26 '13 at 14:40
  • @ahans Wow. Thanks. I didn't notice that. –  Feb 26 '13 at 14:41
  • It would probably be best to get rid of manually allocated buffer altogether. You could use a `vector buf` and then pass `&buf[0]` to file.read. Alternatively see the accepted answer to this question: http://stackoverflow.com/questions/1816319/reading-directly-from-an-stdistream-into-an-stdstring – ahans Feb 26 '13 at 14:44
  • What are the `zlib_compress` and `zlib_decompress` functions doing, can you provide some source? They are not from the original zlib. – PlasmaHH Feb 26 '13 at 14:44
  • @ahans Don't vectors have a slight overhead? And what is the advantage as opposed to manual allocation? –  Feb 27 '13 at 01:23
  • @Ken Sure they have a slight overhead, but that's not much more than an int holding the vector's size. Furthermore, in `readFile()` you would use an automatically memory managed `vector`, so that int would live on the stack and not incur an allocation overhead. The advantage would be that you don't have to do memory management yourself and can be sure that the code doesn't leak. And if you go with Timo Geusch's suggestions, you would use a `vector` anyway and don't have to convert. The resulting code would work as you want it to, be shorter and thus more maintainable. – ahans Feb 27 '13 at 08:45

3 Answers3

4

First, you're dealing with binary data that might or might not have embedded null characters. std::string isn't really the correct container for that, although you can handle embedded null characters if you do it correctly. However, using a std::string to store something documents a certain expectation and you're breaking that convention.

Second, the line std::string data(buffer); isn't doing what you think it does - that is the constructor you're supposed to use to construct a string from a null-terminated C string. You're dealing with binary data here so there is a chance that you're either don't get the full buffer into the string because it encounters a null terminator in the middle of the buffer, or it runs off the end of the buffer until it finds a null (or a seg fault). If you absolutely, positively must use a std::string, use the "correct" constructor, which would be std::string data(buffer, length);.

All that said, you are using the wrong data structure - what you want is a dynamic array of char/unsigned char. That would be a std::vector, not a std::string. As an aside, you should also pass the parameters to readFileand writeFile by const reference, the code that you wrote will make copies of the strings and if the buffer you pass into writeFile() is large, that will lead to an unpleasant hit in memory consumption and performance, plus it is completely unnecessary.

Timo Geusch
  • 24,095
  • 5
  • 52
  • 70
1

As the file might contain '\0' characters, you should specify the size when you assign the content to the std::string.

std::string data(buffer, length);
alexisdm
  • 29,448
  • 6
  • 64
  • 99
0

For what it's worth, here's how you could alter readFile() and writeFile():

std::vector<char> readFile(const std::string& filename)
{
    std::ifstream file;
    file.open(filename.c_str(), std::ios::binary);

    file.seekg (0, std::ios::end);
    const int length = file.tellg();
    file.seekg (0, std::ios::beg);

    std::vector<char> data(length);
    file.read(&data[0], length);
    file.close();

    return data;
}

void writeFile(const std::string& filename, const std::vector<char>& data)
{
    std::ofstream file;
    file.open(filename.c_str(), std::ios::binary);
    file.write(&data[0], data.size());
    file.close();
}

Then you would also change your compress() and decompress() functions to work with std::vector<char>. Also note that so far the code is lacking any error handling. For example, what happens if the file doesn't exist? After calling file.open() you can check for any error by doing if (!file) { /* error handling */ }.

ahans
  • 1,697
  • 12
  • 19