2

I've got a simple wav header reader i found online a long time ago, i've gotten back round to using it but it seems to replace around 1200 samples towards the end of the data chunk with a single random repeated number, eg -126800. At the end of the sample is expected silence so the number should be zero.

Here is the simple program:

void main() {
    WAV_HEADER* wav = loadWav(".\\audio\\test.wav");
    double sample_count = wav->SubChunk2Size * 8 / wav->BitsPerSample;

    printf("Sample count: %i\n", (int)sample_count);

    vector<int16_t> samples = vector<int16_t>();

    for (int i = 0; i < wav->SubChunk2Size; i++)
    {
        int val = ((wav->data[i] & 0xff) << 8) | (wav->data[i + 1] & 0xff);
        samples.push_back(val);
    }
    printf("done\n");
}

And here is the Wav reader:

typedef struct
{
    //riff
    uint32_t Chunk_ID;
    uint32_t ChunkSize;
    uint32_t Format;

    //fmt
    uint32_t SubChunk1ID;
    uint32_t SubChunk1Size;
    uint16_t AudioFormat;
    uint16_t NumberOfChanels;
    uint32_t SampleRate;
    uint32_t ByteRate;

    uint16_t BlockAlignment;
    uint16_t BitsPerSample;

    //data
    uint32_t SubChunk2ID;
    uint32_t SubChunk2Size;

    //Everything else is data. We note it's offset
    char data[];

} WAV_HEADER;
#pragma pack()

inline WAV_HEADER* loadWav(const char* filePath)
{
    long size;
    WAV_HEADER* header;
    void* buffer;

    FILE* file;

    fopen_s(&file,filePath, "r");
    assert(file);

    fseek(file, 0, SEEK_END);
    size = ftell(file);
    rewind(file);

    std::cout << "Size of file: " << size << std::endl;

    buffer = malloc(sizeof(char) * size);
    fread(buffer, 1, size, file);

    header = (WAV_HEADER*)buffer;

    //Assert that data is in correct memory location
    assert((header->data - (char*)header) == sizeof(WAV_HEADER));

    //Extra assert to make sure that the size of our header is actually 44 bytes
    assert((header->data - (char*)header) == 44);

    fclose(file);

    return header;
}

Im not sure what the problem is, i've confirmed that there is no meta data, nor is there a mis match between the numbers read from the header of the file and the actual file. Im assuming its a size/offset misallignment on my side, but i cannot see it. Any help welcomed. Sulkyoptimism

  • You should be opening the file in binary mode and checking the return value from `fread` to make sure you read the amount of data you expected. – Retired Ninja Jan 07 '22 at 21:15
  • Unrelated: [Here's a vastly superior way to read an entire binary file into a container](https://stackoverflow.com/a/36659103/4581301). – user4581301 Jan 07 '22 at 21:23
  • `header = (WAV_HEADER*)buffer;` is not safe. Any object can be viewed as an array of char, but the reverse isn't guaranteed to be true. [See What is the strict aliasing rule? for details](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). That said, because memory is typically allocated on nice 32 and 64 bit boundaries, you'll get away with this most of the time. – user4581301 Jan 07 '22 at 21:44
  • @user4581301 but that's **exactly** how you deal with binary file formats with fixed headers. You verify length, then you cast to a pointer to a struct, and verify and afterwards use the elements of that struct. Yes, alignment matters, often you memcpy to an aligned area for that reason. – Marcus Müller Jan 07 '22 at 21:45
  • @MarcusMüller Agreed. You might also need an extra block in there to correct for endian, but for the most part the cast works if you've ensured the alignment. Doesn't make it legal, though. – user4581301 Jan 07 '22 at 21:54
  • @user4581301 let's not go language-lawyer on this, because of course you're right, once you load something from anywhere *it's impossible to know for the program whether it's valid data*. However, file io semantics as defined by the standard guarantee that you can, on the same platform, write a `WAV_HEADER` to a file, load it from that file into a region of the same alignment strictness, and [get a *similar*](https://en.cppreference.com/w/cpp/language/reinterpret_cast#Type_aliasing) type, so that it's *not* undefined behaviour what happens! – Marcus Müller Jan 07 '22 at 22:01
  • That's the new similarity rules [I'm still wrapping my head around](https://stackoverflow.com/a/70613445/4581301) figuring out what C++ considers similar enough and not. Those rules are a long time coming because all of the ass-coving in C++ kind-of flies in the face of common sense a lot of the time. – user4581301 Jan 07 '22 at 22:19

2 Answers2

2

WAV is just a container for different audio sample formats.

You're making assumptions on a wav file that would have been OK on Windows 3.11 :) These don't hold in 2021.

Instead of rolling your own Wav file reader, simply use one of the available libraries. I personally have good experiences using libsndfile, which has been around roughly forever, is very slim, can deal with all prevalent WAV file formats, and with a lot of other file formats as well, unless you disable that.

This looks like a windows program (one notices by the fact you're using very WIN32API style capital struct names – that's a bit oldschool); so, you can download libsndfile's installer from the github releases and directly use it in your visual studio (another blind guess).

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
  • While i appreciate this answer and this is something i will absolutely do, im still just trying to get this working as a learning exercise. I get that im using an old C file for this but the wav im using is pretty standard as far as im aware. Other programs confirm the number of bytes and samples are the same as the numbers read here. – Sulkyoptimism Jan 08 '22 at 00:52
  • @Sulkyoptimism your WAV file doesn't fit the structure you're assuming. There's no way to fix this other than to write a correct WAV parser and accomodate for the data your WAV is actually containing. "pretty standard": as said, this is 30 (!) years old stuff, "pretty standard" has a changing meaning over time. – Marcus Müller Jan 08 '22 at 11:20
  • Okay well if thats the case is there any type of documentation that would outline the differences between common wav formats found? As i've not been able to find any, and im assuming you're answering from experience as you've not linked any resources? – Sulkyoptimism Jan 09 '22 at 12:53
  • I've linked to libsndfile's source code! https://github.com/libsndfile/libsndfile/blob/master/src/wav.c tells you how *they* parse a wav file. It's not *totally* different to what you do, but there's *way* more decisions to be made. My experience is that I worked on a piece of software that was used to open wav files, often such generated by appropriate matlab functions, but at some point we realized we stopped being able to open `scipy`-written wav files. Friend of mine had written the original wavfile source, and it worked beautifully – for its very limited set of compatible .wavs. – Marcus Müller Jan 09 '22 at 13:04
  • 1
    Thats perfect thank you, i'll try reading this and then eventually use the sndfilelib anyway but i like to know this stuff. And i want to know the answer to the question, at a lower level. – Sulkyoptimism Jan 09 '22 at 13:18
  • @Sulkyoptimism you're very welcome, and I definitely cheer on you for your efforts! – Marcus Müller Jan 09 '22 at 13:40
0

Apple (macOS and iOS) software often does not create WAVE/RIFF files with just a canonical Microsoft 44-byte header at the beginning. Those Wave files can instead can use a longer header followed by a padding block.

So you need to use the full WAVE RIFF format parsing specification instead of just reading from a fixed size 44 byte struct.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153