2

Hi Sarem

Background

I have an application that detects when somebody says 'Hi Sarem' as a kind of electronic lock. I wanted to do something like 'Hi Siri' but since that is taken I went for something a bit different, like 'Hi Sarem'.

Implementation

The code samples audio from the mic, fits an FFT and then checks for three consecutive frequencies, so you could trigger it if you e.g. whistle or play the correct three notes on a piano as well. Those frequencies need to be triggered within a certain time from one another and are configurable using the sliders. The code contains the parameters you need to set timings and tolerances and so on. The three sliders represent the three 'notes' in 'Hi-Sa-rem'.

UI

The image here gives an idea of the UI. As the relevant frequencies are detected the bullets turn red and once the whole sequence is detected the big one turns red. The slider at the top acts as a monitor that continuously monitors the frequency 'heard' so you can use that to calibrate the notes.

Hi Sarem UI

Problem

I have a few problems with this. Accuracy is a big one but not the primary one. (I think if I had a scarier mama this might have been more accurate and also done by lunch but that is another story ...)

So here goes - the primary problem.

This works decently on a device, but on a simulator I get the following in the log

2020-07-26 18:47:13.543219+0200 HiSarem[68826:1238118] [plugin] AddInstanceForFactory: No factory registered for id <CFUUID 0x600000788320> F8BB1C28-BAE8-11D6-9C31-00039315CD46
2020-07-26 18:47:13.575866+0200 HiSarem[68826:1238118] No exclusivity (null)

I suspect it is to do with access rights but I am not sure. I looked everywhere I know but it does not make sense to me that the error will complain about a factory not being registered. Also, why is it working on the device and not in the simulator? Now I do print out that I could not get exclusive access to the device but even without requesting or locking the mic I still get the problem.

Code

This comes from the default ViewController that a single view app will give and I did describe how the UI is hooked up to it. So you should be able to paste it simply into a project and run it if you need to. This is a bit of a test project and not refined, but also in the spirit of MRE you have all the code.

#import <AVKit/AVKit.h>
#import <Accelerate/Accelerate.h>

#import "ViewController.h"

// Amplitute threshold
#define THRESHOLD    500

// Maximum frequency
#define MAXFREQ     7000

// Tolerance (% so 0.1 is 10%)
#define TOL          0.1

// Reset if no match within so many millis
#define RESETMIL    1500
#define BIGRESETMIL 5000

@interface ViewController () < AVCaptureAudioDataOutputSampleBufferDelegate >

@property (weak, nonatomic) IBOutlet UISlider  * monitorSlider;
@property (weak, nonatomic) IBOutlet UISlider  * phrase1Slider;
@property (weak, nonatomic) IBOutlet UISlider  * phrase2Slider;
@property (weak, nonatomic) IBOutlet UISlider  * phrase3Slider;

@property (weak, nonatomic) IBOutlet UILabel   * phrase1Label;
@property (weak, nonatomic) IBOutlet UILabel   * phrase2Label;
@property (weak, nonatomic) IBOutlet UILabel   * phrase3Label;
@property (weak, nonatomic) IBOutlet UILabel   * successLabel;

@property (nonatomic)         BOOL               busy;
@property (nonatomic, strong) AVCaptureSession * avSession;
@property (nonatomic, strong) AVCaptureInput   * avInput;
@property (nonatomic, strong) AVCaptureDevice  * avDevice;
@property (nonatomic, strong) AVCaptureOutput  * avOutput;

@property (nonatomic) double   prevF;
@property (nonatomic) NSDate * prevTime;

@end

@implementation ViewController

+ ( NSString * ) offText
{
    return @"⚫️";
}

+ ( NSString * ) onText
{
    return @"";
}

// See if we can turn on for a given frequency
- ( BOOL ) turnOn:( double ) f
         want:( double ) w
{
    double wLo = w * ( 1 - TOL );
    double wHi = w * ( 1 + TOL );

    return self.prevF < wLo && f >= wLo && f <= wHi;
}

// Update the value
- ( void ) measure:( int    ) s
         n:( int    ) n
{
    // Convert
    double f = 44100.0 * s / n;

    if ( f <= MAXFREQ )
    {
        self.monitorSlider.value = f;

        // See where we are with the sliders
        if ( [self.phrase1Label.text isEqualToString:ViewController.offText] )
        {
            // See if we can turn on 1
            if ( [self turnOn:f want:self.phrase1Slider.value] )
            {
                self.phrase1Label.text = ViewController.onText;

                // Match
                self.prevTime = NSDate.date;
            }
        }
        else if ( [self.phrase2Label.text isEqualToString:ViewController.offText] )
        {
            // See if we can turn on 2
            if ( [self turnOn:f want:self.phrase2Slider.value] )
            {
                self.phrase2Label.text = ViewController.onText;

                // Match
                self.prevTime = NSDate.date;
            }
        }
        else if ( [self.phrase3Label.text isEqualToString:ViewController.offText] )
        {
            // See if we can turn on 3
            if ( [self turnOn:f want:self.phrase3Slider.value] )
            {
                self.phrase3Label.text = ViewController.onText;
                self.successLabel.text = ViewController.onText;

                // Big match
                self.prevTime = NSDate.date;
            }
        }
    }

    // Reset if we do not get a match fast enough
    if ( self.prevTime )
    {
        NSTimeInterval d = [NSDate.date timeIntervalSinceDate:self.prevTime] * 1000;

        if ( d > RESETMIL )
        {
            self.phrase1Label.text = ViewController.offText;
            self.phrase2Label.text = ViewController.offText;
            self.phrase3Label.text = ViewController.offText;
        }
        if ( d > BIGRESETMIL )
        {
            self.successLabel.text = ViewController.offText;
        }
    }
}

- ( void ) viewDidLoad
{
    super.viewDidLoad;
}

- ( void ) viewDidAppear:(BOOL)animated
{
    [super viewDidAppear:animated];

    if ( self.requestPermission )
    {
        self.startCapture;
    }
}

- ( void ) viewWillDisappear:(BOOL)animated
{
    [super viewWillDisappear:animated];

    if ( self.avSession )
    {
        self.avSession.stopRunning;
        self.avSession = nil;
    }
}

- ( BOOL ) requestPermission
{
    if ( AVAudioSession.sharedInstance.recordPermission == AVAudioSessionRecordPermissionGranted )
    {
        return YES;
    }
    else if ( AVAudioSession.sharedInstance.recordPermission == AVAudioSessionRecordPermissionDenied )
    {
        UIAlertController * alert = [UIAlertController alertControllerWithTitle:@"No ears"
                                        message:@"I can not hear you - please change it quickly"
                                     preferredStyle:UIAlertActionStyleDefault];

        [alert addAction:[UIAlertAction actionWithTitle:@"Apologies"
                              style:UIAlertActionStyleDefault
                            handler:nil]];

        [self presentViewController:alert
                   animated:YES
                 completion:nil];

        return NO;
    }
    else
    {
        [AVAudioSession.sharedInstance requestRecordPermission:^ ( BOOL granted ) {

            if ( granted )
            {
                self.startCapture;
            }
            
        }];

        return NO;
    }
}

- ( void ) startCapture
{
    if ( ! self.busy )
    {
        self.busy = YES;
        
        // Create the capture session.
        NSError          * avErr;
        AVCaptureSession * captureSession = [[AVCaptureSession alloc] init];
        
        // Default anyhow
        captureSession.sessionPreset = AVCaptureSessionPresetHigh;

        // Lookup the default audio device.
        AVCaptureDevice * audioDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];

        if ( [audioDevice lockForConfiguration: & avErr] )
        {
            // Wrap the audio device in a capture device input.
            AVCaptureDeviceInput * audioInput = [AVCaptureDeviceInput deviceInputWithDevice:audioDevice
                                                  error: & avErr];
            
            audioDevice.unlockForConfiguration;

            if ( audioInput )
            {
                // If the input can be added, add it to the session.
                if ( [captureSession canAddInput:audioInput] )
                {
                    [captureSession addInput:audioInput];
                    
                    AVCaptureAudioDataOutput * audioOutput = [[AVCaptureAudioDataOutput alloc] init];
                    
                    if ( [captureSession canAddOutput:audioOutput] )
                    {
                        [audioOutput setSampleBufferDelegate:self
                                           queue:dispatch_queue_create ( "ears", NULL )];
                        [captureSession addOutput:audioOutput];

                        // Do on background
                        dispatch_async ( dispatch_queue_create ( "spotty", NULL ), ^ {
                            
                            NSLog ( @"Come to papa" );
                            captureSession.startRunning;
                            
                            // Done
                            dispatch_async ( dispatch_get_main_queue (), ^ {
                                
                                self.busy      = NO;
                                self.avSession = captureSession;
                                self.avDevice  = audioDevice;
                                self.avInput   = audioInput;
                                self.avOutput  = audioOutput;
                                
                            } );
                        } );
                    }
                    else
                    {
                        NSLog ( @"Not today : add output" );
                        self.busy = NO;
                    }
                }
                else
                {
                    NSLog( @"Sorry : add input" );
                    self.busy = NO;
                }
            }
            else
            {
                NSLog( @"Ooops %@", avErr );
                self.busy = NO;
            }
        }
        else
        {
            NSLog( @"No exclusivity %@", avErr );
            self.busy = NO;
        }
    }
}

#pragma mark -
#pragma mark Audio capture delegate

- ( void ) captureOutput:( AVCaptureOutput     * ) output
   didOutputSampleBuffer:( CMSampleBufferRef     ) sampleBuffer
      fromConnection:( AVCaptureConnection * ) connection
{
    CMItemCount n = CMSampleBufferGetNumSamples ( sampleBuffer );

    // We have our standards
    if ( n == 1024 )
    {
        AudioBufferList audioBufferList;
        
        CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer (
                                     sampleBuffer,
                                     NULL,
                                     & audioBufferList,
                                     sizeof ( audioBufferList ),
                                     NULL,
                                     NULL,
                                     kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
                                     & sampleBuffer
                                     );
        
        // Loop buffers
        for ( int b = 0; b < audioBufferList.mNumberBuffers; b ++ )
        {
            // Evaluate samples
            [self fft:audioBufferList.mBuffers [ b ].mData];
        }

        // Release the baby ... I mean buffer
        CFRelease ( sampleBuffer );
    }
}

- ( void ) fft:( SInt16 * ) samples
{
    // In place so r and i are both input and output
    COMPLEX_SPLIT c;

    float r [ 512 ];
    float i [ 512 ];

    c.realp = r;
    c.imagp = i;

    // Load it and calculate maximum amplitute along the way
    int amp = 0;

    for ( int s = 0; s < 512; s ++ )
    {
        SInt16 ev = samples [ s * 2     ];
        SInt16 od = samples [ s * 2 + 1 ];
    
        // Convert to float
        r [ s ] = ( float ) ev;
        i [ s ] = ( float ) od;

        if ( amp < ev )
        {
            amp = ev;
        }
        if ( amp < od )
        {
            amp = od;
        }
    }

    // Only proceed if we have a big enough amplitute
    if ( amp > THRESHOLD )
    {
        FFTSetup fft = vDSP_create_fftsetup ( 10, kFFTRadix2 );
        
        if ( fft )
        {
            // FFT!
            vDSP_fft_zrip ( fft, & c, 1, 10, FFT_FORWARD );
            
            // Get frequency
            int   maxS = 0;
            float maxF = 0;

            for ( int s = 1; s < 512; s ++ )
            {
                float f = r [ s ] * r [ s ] + i [ s ] * i [ s ];
                
                if ( f > maxF )
                {
                    maxF = f;
                    maxS = s;
                }
            }

            // Dealloc
            vDSP_destroy_fftsetup ( fft );

            // Done
            dispatch_async ( dispatch_get_main_queue (), ^ {
                
                [self measure:maxS
                        n:1024];

            } );
        }
    }
}

@end

Why does this work well on a device but refuses on a simulator?

Then, secondary question, since I did give all the detail here, any ideas on how to improve the accuracy or will that only be accomplished by using more frequency triggers?

TIA

skaak
  • 2,988
  • 1
  • 8
  • 16

1 Answers1

1

Welcome to the world of debugging with only real devices cause Audio is involved and simulator can be picky with this.

Keep in mind that you want AVCaptureXYZ pointers set to nil/NULL before allocating anything to them. Audio is C business and Objective-C is not the ideal language to call methods that do buffer work fast fast fast. Even tho it works.. Nothing new yet.

Also you may want a device before opening any session, so AVCaptureSession can go after AVCaptureDevice initiation. I know the docs tell the oppsite. But you don't need a session when there is no device, right? :)

when writing in dispatch_async(..., do self->_busy instead of self.busy. And dispatch_async(dispatch_get_main_queue(),^{}) is thread business, place it where it belongs, around the access to UIKit stuff. In example inside -(void)measure:(int)samples n:(int)n.

and do yourself a favour and change objective-C -(void)fft:(SInt16 *)samples; to

void fft(SInt16* samples, int *result) {
    //do fast fourier transformation
}

if you need access to self inside this function, you are actually doing something close to wrong. Avoid using ObjC method calls in Audio threads. What about giving a void* pointer variable to this function to make self accessible from inside the function. Or pass a reference pointer to the function to change a given variables content. Or let it return the result instead.

And ignore this specific Simulator Warning. It's a warning that it adds an instance for factory because there where none yet with that CFUUID.. It is not a bug, it is because you run AV_XYZ-iOS stuff on Simulator which is OSX off course.

some tiny changes.. your float conversion could look like.

SInt16 amp = 0;
int s=0;
SInt16 evens;
SInt16 odds;
while ( s < 512 ) {
    evens = samples[s * 2    ];
    odds  = samples[s * 2 + 1];
    r[s] = (float)evens;
    i[s] = (float)odds;
    amp = MAX(amp,MAX(odds,evens));
    s++;
}

and in delegate Method -captureOutput:didOutputSampleBuffer:fromConnection:

CMItemCount numSamplesInBuffer = CMSampleBufferGetNumSamples(sampleBuffer);
// works only with 1024 samples
if ( numSamplesInBuffer == 1024 ) {
    AudioBufferList audioBufferList;
    CMBlockBufferRef buffer = CMSampleBufferGetDataBuffer(sampleBuffer);
    CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer,
                                 NULL,
                                 &audioBufferList,
                                 sizeof(audioBufferList),
                                 NULL,
                                 NULL,
                                 kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
                                 &buffer //now its correct pointer
                                 );

    //provide variable for feedback
    int result = 0;

    // Loop buffers
    int b = 0;
    for (; b < audioBufferList.mNumberBuffers; b ++) {
        // Evaluate samples
        // use C if possible, don't call ObjC in functions if possible
        fft(audioBufferList.mBuffers[b].mData, &result);
    }
    // later Release the baby ... I mean buffer <- yes buffer :)
    CFRelease(buffer);
    
    [self measure:result n:1024];
}
Ol Sen
  • 3,163
  • 2
  • 21
  • 30
  • great great great - thanks all the feedback is really valuable but I was surprised you said nothing about inside of fft. sure I get the self-improvement / C pointers bit but this is also my first time using av and dsp and I am very suspicious about the way I did the fft, especially the loop where I convert the int audio to complex float ... any comments there? tx !!! – skaak Jul 27 '20 at 00:14
  • just sitting reading the paste/copy in xcode. making the changes i wrote.. :D – Ol Sen Jul 27 '20 at 00:52
  • Thanks for the night shift updates - nice! That buffer bug you fixed caused some errors I now understand. I was going to do this in Swift initially but got the idea from a now-deleted question that specifically mentioned Objective-C. But based on performance I noted (in another question that is now deleted) I did not think C would be necessary here, but everything you say is performance and C. Wow. I guess this will make it near impossible for anybody but Apple themselves to implement something like Siri given the hardware tweaks you'll need and you really need that extra 10% that C gives you? – skaak Jul 27 '20 at 04:57
  • it's not only for audio valuable. doing midi analytics and visualising that in real time with metalkit shaders, i have measured more than 15% speed diff for re-filling my buffers and call encode a new frame between objC and C. I tried going along with swift for that and it became worse. Just handling strikes and stripe, pointers types with long names because * is not existing concept. Any Xcode version introduces so much new language features which tells me swift is far from ready. Just said. metal shaders are actually C syntax. even tho so much company's think swift is the ultimate thing. – Ol Sen Jul 27 '20 at 12:35
  • doing actually still all my important stuff in C. datatype-heaven or hell. depends on viewpoint. when i see swift, it baffles me because the functions names and classes become less readable to me, each function needs public and override information with even worse grammar(_ hello: Hello, iseventually but:NOT, cute:?nil). So i watched myself and found i write actually more useless stuff in swift :) and the stuff i fell in love is still objC that just works with C or C++ together. – Ol Sen Jul 27 '20 at 12:57
  • Ol Sen - I cut my teeth in C and it was / is definitely heaven !!!!! There was a time when you could do miracles with a void pointer. Then C++ came along and everything had to be safe and declared. C++ initially was like Swift - lots of changes between versions. Eventually I became Java guy for many years but with Objective-C you have best of both worlds. Close to the metal, void pointers and C when you need it and ARC to watch your back. Whenever I try to swallow Swift it leaves such a bad taste in the mouth I go back to Objective-C. But Apple is pushing so hard that there is no more room ... – skaak Jul 27 '20 at 13:43
  • http://ankit.im/swift/2016/05/21/creating-objc-cpp-packages-with-swift-package-manager/ – Ol Sen Jul 27 '20 at 17:16
  • You pushing me towards Swift too? One day I will probably need that ... thanks for the link. – skaak Jul 31 '20 at 09:44
  • I've worked on this some more, added the periodogram and applied it to my own stuff as well, which is (economic) time series with sample size that is not power of 2. Then I calculate a periodogram using my own DFT and also the FFT more or less as here. Afterwards I *stretch* the periodogram because I had to zero pad the input series. Anyhow, I note noise at the short frequencies / long periods when I use FFT on the padded sample. You have any comments on that? – skaak Jul 31 '20 at 09:45
  • My own take on it is that, because the FFT interpolate, I should throw away those short frequencies or interpolate better ... but given your extensive experience in this field, just wondering what your take on it would be? – skaak Jul 31 '20 at 09:50
  • haha, no not pushing to swift. Actually just showing that package manager can handle C, C++, ObjC/++ as well. – Ol Sen Jul 31 '20 at 12:19
  • actually anything that simplifies audio fingerprint so simple that you can query in pre-stored entries of fingerprints makes it work. You could start a startup and name it shezalamabim or so.. :D – Ol Sen Jul 31 '20 at 14:41
  • the heavy science lies in the query all the possible wave buffer simplified keys in a way that it is fast. generating the fingerprint out of audio is actually less complex than often thought. – Ol Sen Jul 31 '20 at 22:58
  • Trouble with Swift - I've always tried to learn it from the manual ... then it is horrible, all the fluffly stuff about optionals and all the sugar makes you sick ... but just taking a project and starting with it makes it much easier to swallow ... – skaak Aug 27 '20 at 05:47
  • yep.. specially when you want to keep up fast buffering of real time data for use in metal shaders back in metal, so to speak C and do that on three scenes same time. pointers of strides that index a stripe of pointers that buffers an array that is nil and so.. – Ol Sen Aug 28 '20 at 03:51
  • Swift is nice but I already run into issues ... I think I'll try to be Ol Sen and migrate my old Obj-C stuff to super fast C using lots of pointers rather than to Swift. – skaak Aug 28 '20 at 07:12