-1

I'm trying to load a .wav file but it takes forever. The code is use:

    std::ifstream file (filePath, std::ios::binary);
    
    // check if the file exists
    if (! file.good())
    {
        reportError ("ERROR: File doesn't exist or otherwise can't load file\n"  + filePath);
        return false;
    }
    
    file.unsetf (std::ios::skipws);
    std::istream_iterator<uint8_t> begin (file), end;
    std::vector<uint8_t> fileData (begin, end);

As I said, it takes forever to load a 27MB file. But I have seen a few posts online that claim to have loaded a 106MB file in around 1400ms by using getline(). I know that ifstream::read() can load data faster, but then I wouldn't get the data as uint8_t back. I haven't really worked much with files so it would be nice if somebody could explain how i could read the data fast and get it as uint8_t. Any help regarding this would be appreciated.

EDIT:

I now reserve space for the bytes, but it is still slow:

   std::ifstream file(filePath, std::ios::binary);
    auto size = std::filesystem::file_size(filePath);
    // check if the file exists
    if (!file.good())
    {
        reportError("ERROR: File doesn't exist or otherwise can't load file\n" + filePath);
        return false;
    }

    file.unsetf(std::ios::skipws);
    std::istream_iterator<uint8_t> begin(file), end;
    std::vector<uint8_t> fileData;
    fileData.reserve(size);
    fileData = std::vector<uint8_t>(begin, end);

Any other ideas on how I might speed this up?

EDIT 2: I searched online and found this:

char* cp = new char[100000000];
    std::ofstream ofs("bff.txt"); //make a huge file to test with.   
    ofs.write(cp, 100000000);
    ofs.close();
    std::ifstream ifs("bff.txt");
    ifs.read(cp, 100000000);
    ifs.close();

This is extremely fast and also copies all the data into into the char array extremely quickly. Can anyone tell me how I might do this with uint8_ts?

FSY
  • 21
  • 2
  • 1
    What's wrong with `ifstream::read()`? *"I wouldn't get the data as uint8_t"* Why not? – HolyBlackCat Jun 26 '22 at 14:30
  • Ok I misunderstood something. I debuged the code and the problem is copying the data into thec vector, not the reading from a file. How can I improve that? – FSY Jun 26 '22 at 14:32
  • `std::stream_iterator` is an input iterator, so the vector constructor can't tell in advance how much space to allocate. That means it has to keep resizing its data store, and that, in turn, requires copying the data multiple times. If you know the size of the file you can pre-allocate space with `fileData.reserve()`. (if someone wants to expand this into an answer, feel free...) – Pete Becker Jun 26 '22 at 14:33
  • Related: [How do I read an entire file into a std::string in C++?](https://stackoverflow.com/q/116038/364696) (and the many questions duped to it) – ShadowRanger Jun 26 '22 at 14:35
  • 1
    @FSY `.resize()` followed by one large `.read()` should be good enough. – HolyBlackCat Jun 26 '22 at 14:36
  • Allocate 20-100MB buffer in RAM, then read whole file at once. Later you can scan the file in memory. – i486 Jun 26 '22 at 14:38
  • @PeteBecker I now edited the question and I now reserve space for the data, but it is still slow to actually store the data into the vector. Is there any faster method? – FSY Jun 26 '22 at 14:57
  • I suggested using `.read()` in place of `istream_iterator`. *"how I might do this with uint8_ts"* `reinterpret_cast`? – HolyBlackCat Jun 26 '22 at 15:04
  • `fileData = std::vector(begin, end);` You call `reserve` on one vector (`fileData`) but actually read into a different one (a temporary constructed with `std::vector(begin, end)`) Try `fileData.assign(begin, end);` - this would avoid the temporary. That said, `read()` directly into the vector would be the fastest. – Igor Tandetnik Jun 26 '22 at 15:54
  • @HolyBlackCat _reinterpret_cast?_ Certainly not! :) – Paul Sanders Jun 26 '22 at 15:54
  • @PaulSanders Apparently `basic_ifstream` also works, but it's unorthodox at least. – HolyBlackCat Jun 26 '22 at 15:58
  • @HolyBlackCat Unorthodox? Not sure I see why, seems to me that's what that template is for. – Paul Sanders Jun 26 '22 at 16:01
  • 1
    @PaulSanders I believe `char_traits` is not specialized for it? Maybe it's fine, I didn't check. But the sole fact that I'd have to look it up already makes me avoid it. – HolyBlackCat Jun 26 '22 at 18:16
  • @HolyBlackCat [Seems it isn't](https://en.cppreference.com/w/cpp/string/char_traits), but given that all we're doing is calling `read` it probably doesn't matter here. – Paul Sanders Jun 26 '22 at 18:59

1 Answers1

1

The way to do this is indeed by using read, but to get your uint8_ts without having to cast any pointers, you can use std::basic_ifstream, which is a template. Using this lets you specify the pointer type.

So, quick proof-of-concept program (no error checking or anything, just to show you how it works):

#include <cstdint>
#include <fstream>
#include <vector>

int main ()
{
    std::basic_ifstream <uint8_t> f ("myfile", std::ios::binary);
    std::vector <uint8_t> v (100);
    f.read (v.data (), v.size ());
}

And that's it!

Demo

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • Except that the standard only defines specializations of `std::basic_ifstream` for `char` and `wchar_t`, not for `uint8_t`. The correct usage is to use `std::ifstream` and typecast the pointer, eg: `std::ifstream f ("myfile", std::ios::binary); f.read (reinterpret_cast(v.data ()), v.size ());` The standard allows for an object pointer to be casted to `char*` for purposes of accessing the object's raw bytes. – Remy Lebeau Jun 26 '22 at 22:03
  • @RemyLebeau Yes, I also see it that way now. – Paul Sanders Jun 26 '22 at 22:17