1

I have a DirectSound application I'm writing in C, running on Windows 7. The application just captures some sound frames, and plays them back. For sanity-checking the capture results, I'm writing out the PCM data to a file, which I can play in Linux using aplay.

Unfortunately, the sound is choppy, sometimes contains stuttering (and plays at the wrong speed in Linux). Oddly, the amount of distortion observed when playing the capture file is less if the PCM data is not played in the playback buffer at the time of capture.

Here's the initialization of my WAVEFORMATEX:

memset(&wfx, 0, sizeof(WAVEFORMATEX)); 
wfx.cbSize = 0;
wfx.wFormatTag = WAVE_FORMAT_PCM; 
wfx.nChannels = 1; 
wfx.nSamplesPerSec = sampleRate; 
wfx.wBitsPerSample = sampleBitWidth;
wfx.nBlockAlign = (wfx.nChannels * wfx.wBitsPerSample) / 8; 
wfx.nAvgBytesPerSec = wfx.nSamplesPerSec * wfx.nBlockAlign code here

The sampleRate is 8000, and sampleBitWidth is 16.

I create a capture and play buffer using this same structure, and the capture buffer has 3 notification positions. I start capturing with:

lpDsCaptureBuffer->Start(DSCBSTART_LOOPING);

I then spark off a playback thread that calls WaitForMultipleObjects on the events associated with the notification points. Upon notification, I reset all the events, and copy the 1 or 2 pieces of the capture buffer to a local buffer, and pass those on to a play routine:

void playFromBuff(LPVOID captureBuff,DWORD captureLen) {
  LPVOID playBuff;
  DWORD playLen;
  HRESULT hr;

  hr = lpDsPlaybackBuffer->Lock(0L,captureLen,&playBuff,&playLen,NULL,0L,0L);

  memcpy(playBuff,captureBuff,playLen);
  hr = lpDsPlaybackBuffer->Unlock(playBuff,playLen,NULL,0L);
  hr = lpDsPlaybackBuffer->SetCurrentPosition(0L);
  hr = lpDsPlaybackBuffer->Play(0L,0L,0L);
}

(some error-checking omitted).

Note that the playback buffer has no notification positions. Each time I get a chunk from the capture buffer, I lock the playback buffer starting at position 0.

The capture code, guarded by the WaitForMultipleObjects, looks like:

    lpDsCaptureBuffer->GetCurrentPosition(NULL,&readPos);

    hr = lpDsCaptureBuffer->Lock(...,...,&captureBuff1,&captureLen1,&captureBuff2,&captureLen2,0L);

where the ellipses contain calculations involving the current and previously-seen read positions. I'm omitting those likely-wrong calculations -- I suspect that's where the problem lies.

My notification positions are multiples of 1024. Yet the read positions reported are 1500, 2500, and 3500. So if I see a read position of 1500, does that mean I can read from bytes 0 to 1500. And when next I see 2500, does that mean I should read from 1501 to 2500? Why do those read positions not correspond exactly to my notification positions? What's the right algorithm here?

I've tried the simpler alternative of stopping the capture when the capture buffer is full, without other notification positions. But that means, I think, allowing some sound to escape capture.

Paul Steckler
  • 617
  • 5
  • 19

1 Answers1

0

My notification positions are multiples of 1024. Yet the read positions reported are 1500, 2500, and 3500. So if I see a read position of 1500, does that mean I can read from bytes 0 to 1500. And when next I see 2500, does that mean I should read from 1501 to 2500? Why do those read positions not correspond exactly to my notification positions? What's the right algorithm here?

DirectSound API is nowadays a compatibility layer on top of other "real" audio capture API. This means that inside audio capture fills some buffers (esp. those multiples of 500) and then passes the filled buffers to DirectSound capture, which in turn reports them to you. This explains why you see read positions as multiples of 500, because DirectSound itself has data available this way.

Since you are interested in getting captured data, your assumption is correct that you are interested mostly in read position. You get the notification and you know what offset is safe to read up to. Since the capture API is layered, there is some latency involved because layers need to pass chunks of data between one another, before making them available to you.

Roman R.
  • 68,205
  • 6
  • 94
  • 158
  • I redid my capture buffer calculations, so that now, when I save the capture data to a file, the sound is perfect when playing with Linux aplay -- IF I don't also play the data with the playback buffer in Windows. Somehow, the act of playing data in a separate buffer affects what's captured. – Paul Steckler Jan 20 '15 at 19:08
  • Some hints: you should be able to find something with debugger and tracing the order of the calls and actions. Perhaps your thread synchronization makes you somehow lose certain loops. For debugging purposes you would want to start with super long capture buffer to make sure you don't have overflows, underflows and stuff. Once things are tuned in this simple scenario, you can change it close to what you eventually want. – Roman R. Jan 20 '15 at 19:23
  • I loaded a WAV file into a buffer, and tried playing it in 2 ways. First, I tried the old-fashioned call sndPlaySound, and that sounded fine. Next, I tried using my DirectSound playback buffer. It sounded choppy and distorted, although it was at the right pitch and speed. Could that be because the emulation layer doesn't work so well? – Paul Steckler Jan 20 '15 at 23:10
  • I see it this way: if you have a standard/external method that plays captured data smoothly, then the capture itself is fine. If capture is affected by concurrent playback, then I would look into threading issues and deadlocks causing capture to act late. If it is only playback which is giving troubles, then there should be a bug there. Note that with playback you again deal with layered APIs and you have to get the data ready and take your hands off buffered data well in advance (20-30-50 ms) or the data arrives late to real playback and you hear choppiness. – Roman R. Jan 21 '15 at 08:02
  • See related Q [What is the smallest audio buffer needed to produce Tone sound without distotions with WaveOUT API](http://stackoverflow.com/questions/14293749/what-is-the-smallest-audio-buffer-needed-to-produce-tone-sound-without-distotion) for another legacy API and the same applies to DirectSound playback - there is a minimal data pre-loading time and doing less than this is leading to playing segments of silence due to late data arrival. – Roman R. Jan 21 '15 at 08:03
  • It doesn't seem to matter whether the call to Play is in the same thread or another, or whether the Play loops or is finite. It doesn't seem to matter whether the playback buffer has any data written to it, or is left untouched. In all cases, the data in the capture buffer seems to be compromised if Play is called. I verified that the sequences of read positions and capture cursor offsets in the capture buffer are roughly the same for the Play and no-Play situation. – Paul Steckler Jan 22 '15 at 21:47