You first need to understand the RIFF header. You will want to open your wavs as binary data.
The first 4 bytes of the header are the 'magic number'. In this case, 52 49 46 46, or RIFF. This is in big-endian(BE) format.
The next 4 bytes are the file size minus the header. This is in little-endian(LE) format, so read the bytes backwards from position 8. This will obviously vary based on the size of the wav file.
The next 4 bytes is BE and will always be 57 41 56 45, or WAVE because it's a wav file.
The next 4 bytes is BE and will always be 66 6D 74 20, or fmt . That's a fmt with a space after. From here things become more variable based on the wave and having the correct matching values is critical to this being somewhat easy.
The chunk size is 4 bytes LE, this will likely be 10 00 00 00 or 28 00 00 00. The chunk is used to let the audio processor determine how to read the actual audio data.
The next 2 bytes are LE and determine the format, 10 00 means PCM, or pulse code modulation, this is usually set via the recording software and shouldn't be changed.
The next 2 bytes are LE and determine if the audio file is mono (01 00) or stereo (02 00).
The next 4 bytes are BE and determine the sample rate. This is highly variable but must match for proper reading. For instance, 44 AC 00 00 would be 44,100 as in 44,100 hz.
The next 4 byte are BE and determine the transfer rate. This can be calculated by taking the channels x sample rate x bit / 8.
The next 2 bytes are LE and are for alignment, this can be calculated by taking channels x bits /8.
The next 2 bytes are LE and determine the bits per sample. 08 00 means 1 sample takes 1 byte.
The next 4 bytes are BE and end the header, this is always 64 61 74 61 or data.
Everything following this is audio data. Here you would split the files, combine them, insert the amount of blank data x sample rate, then read the entire length and create a new header.
Take the following header from a random wav file I have for example:
52 49 46 46 14 60 28 00 57 41 56 45 66 6D 74 20
10 00 00 00 01 00 01 00 22 56 00 00 44 AC 00 00
02 00 10 00 64 61 74 61
We see the RIFF, file size of 28 60 14 (flipped endian) or 2,646,036 where the file is 2,646,044, so you see the 8-byte difference, last on the top line we see the WAVEfmt . Following that is 10 00 00 00, this tells us it is 16 bit. 01 00 tells us it is PCM. 01 00 tells us it is mono channel. The 56 22 00 00 (flipped endian) tells us this file is 22,050 sample rate per second. AC 44 00 00 tell us it is 44,100 hz transfer rate in bits per second. The last line is fairly straightforward, ending is data.
So we can determine the length of audio by diving the byte length, 2646036 by the bit rate, 44100. This gives us 60.0008 or 1 minute, which this audio file is.