Windows Media Foundation recording audio

Question

I'm using the windows media foundation api to enumerate both my microphones and available cameras, which both work.

Here is my enumeration code:

class deviceInput {
public:
    deviceInput( REFGUID source );
    ~deviceInput();

    int listDevices(bool refresh = false);
    IMFActivate *getDevice(unsigned int deviceId);
    const WCHAR *getDeviceName(unsigned int deviceId);

private:
    void Clear();
    HRESULT EnumerateDevices();

    UINT32      m_count;
    IMFActivate **m_devices;
    REFGUID     m_source;
};

deviceInput::deviceInput( REFGUID source )
    : m_devices( NULL )
    , m_count( 0 )
    , m_source( source )
{   }

deviceInput::~deviceInput()
{
    Clear();
}

int deviceInput::listDevices(bool refresh)
{
    if ( refresh || !m_devices ) {
        if ( FAILED(this->EnumerateDevices()) ) return -1;
    }
    return m_count;
}

IMFActivate *deviceInput::getDevice(unsigned int deviceId)
{
    if ( deviceId >= m_count ) return NULL;

    IMFActivate *device = m_devices[deviceId];
    device->AddRef();

    return device;
}

const WCHAR *deviceInput::getDeviceName(unsigned int deviceId)
{
    if ( deviceId >= m_count ) return NULL;

    HRESULT hr = S_OK;
    WCHAR *devName = NULL;
    UINT32 length;

    hr = m_devices[deviceId]->GetAllocatedString( MF_DEVSOURCE_ATTRIBUTE_FRIENDLY_NAME, &devName, &length );
    if ( FAILED(hr) ) return NULL;

    return devName;
}

void deviceInput::Clear()
{
    if ( m_devices ) {
        for (UINT32 i = 0; i < m_count; i++) SafeRelease( &m_devices[i] );
        CoTaskMemFree( m_devices );
    }
    m_devices = NULL;
    m_count = 0;
}

HRESULT deviceInput::EnumerateDevices()
{
    HRESULT hr = S_OK;
    IMFAttributes *pAttributes = NULL;

    Clear();

    hr = MFCreateAttributes(&pAttributes, 1);
    if ( SUCCEEDED(hr) ) hr = pAttributes->SetGUID( MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE, m_source );
    if ( SUCCEEDED(hr) ) hr = MFEnumDeviceSources( pAttributes, &m_devices, &m_count );

    SafeRelease( &pAttributes );

    return hr;
}

To grab audio or camera capture devices, I specify either MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_AUDCAP_GUID or MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID and that works no problem, and I can grab the names of the devices, as well as the IMFActivate. I have code to record the webcam to an output video file, however, I'm having a tough time figuring out how to record the audio to a file. I'm under the impression that I need to use an IMFSinkWriter, but I can't find any examples that use an audio capture IMFActivate and IMFSinkWriter.

I'm not much of a windows api programmer, so I'm sure there's a fairly straight forward answer, but COM stuff is just a bit over my head. As far as audio format, I don't really care, as long as it gets into a file - can be wav, wma, or whatever. Even though I'm recording video, I need the video and audio files separate, so I can't just figure out how to add the audio into my video encoding.

Jeff · Accepted Answer · 2020-10-02T15:38:15.840

I apologize for the late response, and I hope you can still find this valuable. I recently completed a project similar to yours (recording webcam video along with a selected microphone to a single video file with audio). The key is to creating an aggregate media source.

// http://msdn.microsoft.com/en-us/library/windows/desktop/dd388085(v=vs.85).aspx
HRESULT CreateAggregateMediaSource(IMFMediaSource *videoSource,
                                   IMFMediaSource *audioSource,
                                   IMFMediaSource **aggregateSource)
{
    *aggregateSource = nullptr;
    IMFCollection *pCollection = nullptr;

    HRESULT hr = ::MFCreateCollection(&pCollection);

    if (S_OK == hr)
        hr = pCollection->AddElement(videoSource);

    if (S_OK == hr)
        hr = pCollection->AddElement(audioSource);

    if (S_OK == hr)
        hr = ::MFCreateAggregateSource(pCollection, aggregateSource);

    SafeRelease(&pCollection);
    return hr;
}

When configuring the sink writer, you will add 2 streams (one for audio and one for video). Of course, you will also configure the writer correctly for the input stream types.

HRESULT        hr                  = S_OK;
IMFMediaType  *videoInputType      = nullptr;
IMFMediaType  *videoOutputType     = nullptr;
DWORD          videoOutStreamIndex = 0u;
DWORD          audioOutStreamIndex = 0u;
IMFSinkWriter *writer              = nullptr;

// [other create and configure writer]

if (S_OK == hr))
    hr = writer->AddStream(videoOutputType, &videoOutStreamIndex);    

// [more configuration code]

if (S_OK == hr)
    hr = writer->AddStream(audioOutputType, &audioOutStreamIndex);

Then when reading the samples, you will need to pay close attention to the reader streamIndex, and sending them to the writer appropriately. You will also need to pay close attention to the format that the codec expects. For instance, IEEE float vs PCM, etc. Good luck, and I hope it is not too late.

It's been a long time since I've been working on that project, and it has since been taken over by some other people who took it in a different direction. Regardless, thanks for the clear answer with the example code, it's much appreciated, and perhaps someone else will find it very useful :) — OzBarry, Apr 03 '14 at 18:20

score 0 · Answer 2 · edited May 23 '17 at 11:53

0

Did you have hard time to manage DirectShow audio capture in Record directshow audio device to file?

Capturing with Media Foundation is hardly any simpler. Not even mentioning that in general there are a lot more resources on DirectShow out there....

MSDN offers you a WavSink Sample that implements audio capture into file:

Shows how to implement a custom media sink in Microsoft Media Foundation. The sample implements an archive sink that writes uncompressed PCM audio to a .wav file.

I am not sure why they decided to not make this a standard component. Having Media Foundation inferior to DirectShow in many ways, they could at least make this small thing as an advantage. Anyway, you have the sample and it looks like a good start.

edited May 23 '17 at 11:53

Community

1
1

answered Oct 16 '12 at 14:58

Roman R.

68,205
6
94
158

Yeah, I noticed the WavSink example, the issue there is it's actually a transcoder; it takes a pcm audio file, and converts it into a *.wav file, so it doesn't really tell me how to actually get the audio data directly from a device. I was using directshow, but my boss heavily encouraged me (read as told me to) use media foundation. – OzBarry Oct 16 '12 at 15:00
It gives you the most important piece anyway. Yes you need to make the topology not transcoding, but to use real audio capture device. – Roman R. Oct 16 '12 at 15:04

Windows Media Foundation recording audio

2 Answers2

Linked