0

EDIT: In the end I used exactly as I explained below, AVRecorder for recording the speech and openAL for the pitch shift and playback. It worked out quite well.

I got a question regarding recording, modifying and playing back audio. I asked a similar question before ( Record, modify pitch and play back audio in real time on iOS ) but I now have more information and could do with some further advice please.

So firstly this is what I am trying to do (on a separate thread to the main thread):

  1. monitor the iphone mic
  2. check for sound greater than a certain volume
  3. if above threshold start recording e.g. person starts talking
  4. continue to record until volume drops below threshold e.g. person stops talking
  5. modify pitch of recorded sound.
  6. playback sound

I was thinking of using the AVRecorder to monitor and record the sound, good tutorial here: http://mobileorchard.com/tutorial-detecting-when-a-user-blows-into-the-mic/

and I was thinking of using openAL to modify the pitch of the recorded audio.

So my question is, is my thinking correct in the list of points above, am I missing something or is there a better/easier way to do it. Can I avoid mixing audio libraries and just use AVFoundation to change the pitch too?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
bennythemink
  • 5,096
  • 4
  • 36
  • 54

2 Answers2

2

You can either use AVRecorder or something lower like the realtime IO audio unit.

The concept of 'volume' is pretty vague. You might want to look at the difference between calculating peak and RMS values, and understanding how to integrate an RMS value over a given time (say 300ms which is what a VU meter uses).

Basically you sum all the squares of the values. You would take the square root and convert to dBFS with 10 * log10f(sqrt(sum/num_samples)), but you can do that without the sqrt in one step with 20 * log10f(sum/num_samples).

You'll need to do a lot of adjusting of integration times and thresholds to get it to behave the way you want.

For pitch shifting, I think OpenAL with do the trick, the technique behind it is called band limited interpolation - https://ccrma.stanford.edu/~jos/resample/Theory_Ideal_Bandlimited_Interpolation.html

This example shows a rms calculation as a running average. The circular buffer maintains a history of squares, and eliminates the need to sum the squares every operation. I haven't run it so treat it as pseudo code ;)

Example:

class VUMeter
{

protected:

    // samples per second
    float _sampleRate;

    // the integration time in seconds (vu meter is 300ms)
    float _integrationTime;

    // these maintain a circular buffer which contains
    // the 'squares' of the audio samples

    int _integrationBufferLength;
    float *_integrationBuffer;
    float *_integrationBufferEnd;
    float *_cursor;

    // this is a sort of accumulator to make a running
    // average more efficient

    float _sum;

public:

    VUMeter()
    : _sampleRate(48000.0f)
    , _integrationTime(0.3f)
    , _sum(0.)
    {
        // create a buffer of values to be integrated
        // e.g 300ms @ 48khz is 14400 samples

        _integrationBufferLength = (int) (_integrationTime * _sampleRate);

        _integrationBuffer = new float[_integrationBufferLength + 1];
        bzero(_integrationBuffer, _integrationBufferLength);

        // set the pointers for our ciruclar buffer

        _integrationBufferEnd = _integrationBuffer + _integrationBufferLength;
        _cursor = _integrationBuffer;

    }

    ~VUMeter()
    {
        delete _integrationBuffer;
    }

    float getRms(float *audio, int samples)
    {
        // process the samples
        // this part accumulates the 'squares'

        for (int i = 0; i < samples; ++i)
        {
            // get the input sample

            float s = audio[i];

            // remove the oldest value from the sum

            _sum -= *_cursor;

            // calculate the square and write it into the buffer

            double square = s * s;
            *_cursor = square;

            // add it to the sum

            _sum += square;

            // increment the buffer cursor and wrap

            ++_cursor;

            if (_cursor == _integrationBufferEnd)
                _cursor = _integrationBuffer;
        }

        // now calculate the 'root mean' value in db

        return 20 * log10f(_sum / _integrationBufferLength);
    }
};
my fat llama
  • 171
  • 3
  • hey my fat llama, thanks for the advice but pardon my ignorance I've absolutely no idea what all that RMS means :) a wikipedia search just confused me more lol. Can you point me to any good tutorials please? thanks again for your help. – bennythemink Feb 28 '11 at 23:07
  • at its simplest, peak is an instantaneous value, i.e. what is the value 'now', 'rms' is an average over time. http://en.wikipedia.org/wiki/Root_mean_square the first formula there is the only one you need to worry about. – my fat llama Mar 05 '11 at 03:01
  • http://en.wikipedia.org/wiki/DBFS this is the scale used to measure peak levels. http://en.wikipedia.org/wiki/VU_meter this is the meter used to measure rms levels (this is the one you want to implement). – my fat llama Mar 05 '11 at 03:08
  • if that still doesn't make any sense, the best source of info for this stuff is some of the earlier chapters of the yamaha sound reinforcement handbook. it's about live sound, but the audio concepts are universal. http://www.amazon.com/Sound-Reinforcement-Handbook-Yamaha-Products/dp/0881889008 – my fat llama Mar 05 '11 at 03:13
1

OpenAL resampling will change the pitch and the duration inversely. e.g. a sound resampled to a higher pitch will play for a shorter amount of time and thus faster.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153