First of all you'll need to undestand what you are talking about
WAV is a type of RIFF which encodes the sound waves as PCM.
Essentially PCM
means that discrete values of the wave are stored at a certain sample rate (typically 44 kHz)
Each sample may contain information about one or more channels (typically 2)
The values of each sample are stored as a fixed size integer or float. (typically 16 bit integer)
These attributes are stored in the WAV
header
To combine two seperate WAV
files you need to read the header of both files and if you are lucky they will have the same ByteRate ( == samplerate * channel count * bits/sample / 8) then you simply need to concat the second file minus the header to the end of the first, and add the length of the second to the 'length' field of the first.
In any other case I advise you to utilize a library that does reencoding of some sort.
If you have the time and muse, you could do the recoding yourself.
If you don't want to bother with this stuff at all try using a complete program (i.E. sox) that does what you need.
Btw.: Silence is 0 values if this bits per sample are signed and half of the max value if they are unsigned (typically only found in 8 bit integers).
So to get 4 seconds of silence you need to have n = 4 * sample rate * channel num * (bits / seconds) / 8 times 0
Trivia: You could use any constant value instead of 0 for silence