Hi Sarem
Background
I have an application that detects when somebody says 'Hi Sarem' as a kind of electronic lock. I wanted to do something like 'Hi Siri' but since that is taken I went for something a bit different, like 'Hi Sarem'.
Implementation
The code samples audio from the mic, fits an FFT and then checks for three consecutive frequencies, so you could trigger it if you e.g. whistle or play the correct three notes on a piano as well. Those frequencies need to be triggered within a certain time from one another and are configurable using the sliders. The code contains the parameters you need to set timings and tolerances and so on. The three sliders represent the three 'notes' in 'Hi-Sa-rem'.
UI
The image here gives an idea of the UI. As the relevant frequencies are detected the bullets turn red and once the whole sequence is detected the big one turns red. The slider at the top acts as a monitor that continuously monitors the frequency 'heard' so you can use that to calibrate the notes.
Problem
I have a few problems with this. Accuracy is a big one but not the primary one. (I think if I had a scarier mama this might have been more accurate and also done by lunch but that is another story ...)
So here goes - the primary problem.
This works decently on a device, but on a simulator I get the following in the log
2020-07-26 18:47:13.543219+0200 HiSarem[68826:1238118] [plugin] AddInstanceForFactory: No factory registered for id <CFUUID 0x600000788320> F8BB1C28-BAE8-11D6-9C31-00039315CD46
2020-07-26 18:47:13.575866+0200 HiSarem[68826:1238118] No exclusivity (null)
I suspect it is to do with access rights but I am not sure. I looked everywhere I know but it does not make sense to me that the error will complain about a factory not being registered. Also, why is it working on the device and not in the simulator? Now I do print out that I could not get exclusive access to the device but even without requesting or locking the mic I still get the problem.
Code
This comes from the default ViewController
that a single view app will give and I did describe how the UI is hooked up to it. So you should be able to paste it simply into a project and run it if you need to. This is a bit of a test project and not refined, but also in the spirit of MRE you have all the code.
#import <AVKit/AVKit.h>
#import <Accelerate/Accelerate.h>
#import "ViewController.h"
// Amplitute threshold
#define THRESHOLD 500
// Maximum frequency
#define MAXFREQ 7000
// Tolerance (% so 0.1 is 10%)
#define TOL 0.1
// Reset if no match within so many millis
#define RESETMIL 1500
#define BIGRESETMIL 5000
@interface ViewController () < AVCaptureAudioDataOutputSampleBufferDelegate >
@property (weak, nonatomic) IBOutlet UISlider * monitorSlider;
@property (weak, nonatomic) IBOutlet UISlider * phrase1Slider;
@property (weak, nonatomic) IBOutlet UISlider * phrase2Slider;
@property (weak, nonatomic) IBOutlet UISlider * phrase3Slider;
@property (weak, nonatomic) IBOutlet UILabel * phrase1Label;
@property (weak, nonatomic) IBOutlet UILabel * phrase2Label;
@property (weak, nonatomic) IBOutlet UILabel * phrase3Label;
@property (weak, nonatomic) IBOutlet UILabel * successLabel;
@property (nonatomic) BOOL busy;
@property (nonatomic, strong) AVCaptureSession * avSession;
@property (nonatomic, strong) AVCaptureInput * avInput;
@property (nonatomic, strong) AVCaptureDevice * avDevice;
@property (nonatomic, strong) AVCaptureOutput * avOutput;
@property (nonatomic) double prevF;
@property (nonatomic) NSDate * prevTime;
@end
@implementation ViewController
+ ( NSString * ) offText
{
return @"⚫️";
}
+ ( NSString * ) onText
{
return @"";
}
// See if we can turn on for a given frequency
- ( BOOL ) turnOn:( double ) f
want:( double ) w
{
double wLo = w * ( 1 - TOL );
double wHi = w * ( 1 + TOL );
return self.prevF < wLo && f >= wLo && f <= wHi;
}
// Update the value
- ( void ) measure:( int ) s
n:( int ) n
{
// Convert
double f = 44100.0 * s / n;
if ( f <= MAXFREQ )
{
self.monitorSlider.value = f;
// See where we are with the sliders
if ( [self.phrase1Label.text isEqualToString:ViewController.offText] )
{
// See if we can turn on 1
if ( [self turnOn:f want:self.phrase1Slider.value] )
{
self.phrase1Label.text = ViewController.onText;
// Match
self.prevTime = NSDate.date;
}
}
else if ( [self.phrase2Label.text isEqualToString:ViewController.offText] )
{
// See if we can turn on 2
if ( [self turnOn:f want:self.phrase2Slider.value] )
{
self.phrase2Label.text = ViewController.onText;
// Match
self.prevTime = NSDate.date;
}
}
else if ( [self.phrase3Label.text isEqualToString:ViewController.offText] )
{
// See if we can turn on 3
if ( [self turnOn:f want:self.phrase3Slider.value] )
{
self.phrase3Label.text = ViewController.onText;
self.successLabel.text = ViewController.onText;
// Big match
self.prevTime = NSDate.date;
}
}
}
// Reset if we do not get a match fast enough
if ( self.prevTime )
{
NSTimeInterval d = [NSDate.date timeIntervalSinceDate:self.prevTime] * 1000;
if ( d > RESETMIL )
{
self.phrase1Label.text = ViewController.offText;
self.phrase2Label.text = ViewController.offText;
self.phrase3Label.text = ViewController.offText;
}
if ( d > BIGRESETMIL )
{
self.successLabel.text = ViewController.offText;
}
}
}
- ( void ) viewDidLoad
{
super.viewDidLoad;
}
- ( void ) viewDidAppear:(BOOL)animated
{
[super viewDidAppear:animated];
if ( self.requestPermission )
{
self.startCapture;
}
}
- ( void ) viewWillDisappear:(BOOL)animated
{
[super viewWillDisappear:animated];
if ( self.avSession )
{
self.avSession.stopRunning;
self.avSession = nil;
}
}
- ( BOOL ) requestPermission
{
if ( AVAudioSession.sharedInstance.recordPermission == AVAudioSessionRecordPermissionGranted )
{
return YES;
}
else if ( AVAudioSession.sharedInstance.recordPermission == AVAudioSessionRecordPermissionDenied )
{
UIAlertController * alert = [UIAlertController alertControllerWithTitle:@"No ears"
message:@"I can not hear you - please change it quickly"
preferredStyle:UIAlertActionStyleDefault];
[alert addAction:[UIAlertAction actionWithTitle:@"Apologies"
style:UIAlertActionStyleDefault
handler:nil]];
[self presentViewController:alert
animated:YES
completion:nil];
return NO;
}
else
{
[AVAudioSession.sharedInstance requestRecordPermission:^ ( BOOL granted ) {
if ( granted )
{
self.startCapture;
}
}];
return NO;
}
}
- ( void ) startCapture
{
if ( ! self.busy )
{
self.busy = YES;
// Create the capture session.
NSError * avErr;
AVCaptureSession * captureSession = [[AVCaptureSession alloc] init];
// Default anyhow
captureSession.sessionPreset = AVCaptureSessionPresetHigh;
// Lookup the default audio device.
AVCaptureDevice * audioDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
if ( [audioDevice lockForConfiguration: & avErr] )
{
// Wrap the audio device in a capture device input.
AVCaptureDeviceInput * audioInput = [AVCaptureDeviceInput deviceInputWithDevice:audioDevice
error: & avErr];
audioDevice.unlockForConfiguration;
if ( audioInput )
{
// If the input can be added, add it to the session.
if ( [captureSession canAddInput:audioInput] )
{
[captureSession addInput:audioInput];
AVCaptureAudioDataOutput * audioOutput = [[AVCaptureAudioDataOutput alloc] init];
if ( [captureSession canAddOutput:audioOutput] )
{
[audioOutput setSampleBufferDelegate:self
queue:dispatch_queue_create ( "ears", NULL )];
[captureSession addOutput:audioOutput];
// Do on background
dispatch_async ( dispatch_queue_create ( "spotty", NULL ), ^ {
NSLog ( @"Come to papa" );
captureSession.startRunning;
// Done
dispatch_async ( dispatch_get_main_queue (), ^ {
self.busy = NO;
self.avSession = captureSession;
self.avDevice = audioDevice;
self.avInput = audioInput;
self.avOutput = audioOutput;
} );
} );
}
else
{
NSLog ( @"Not today : add output" );
self.busy = NO;
}
}
else
{
NSLog( @"Sorry : add input" );
self.busy = NO;
}
}
else
{
NSLog( @"Ooops %@", avErr );
self.busy = NO;
}
}
else
{
NSLog( @"No exclusivity %@", avErr );
self.busy = NO;
}
}
}
#pragma mark -
#pragma mark Audio capture delegate
- ( void ) captureOutput:( AVCaptureOutput * ) output
didOutputSampleBuffer:( CMSampleBufferRef ) sampleBuffer
fromConnection:( AVCaptureConnection * ) connection
{
CMItemCount n = CMSampleBufferGetNumSamples ( sampleBuffer );
// We have our standards
if ( n == 1024 )
{
AudioBufferList audioBufferList;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer (
sampleBuffer,
NULL,
& audioBufferList,
sizeof ( audioBufferList ),
NULL,
NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
& sampleBuffer
);
// Loop buffers
for ( int b = 0; b < audioBufferList.mNumberBuffers; b ++ )
{
// Evaluate samples
[self fft:audioBufferList.mBuffers [ b ].mData];
}
// Release the baby ... I mean buffer
CFRelease ( sampleBuffer );
}
}
- ( void ) fft:( SInt16 * ) samples
{
// In place so r and i are both input and output
COMPLEX_SPLIT c;
float r [ 512 ];
float i [ 512 ];
c.realp = r;
c.imagp = i;
// Load it and calculate maximum amplitute along the way
int amp = 0;
for ( int s = 0; s < 512; s ++ )
{
SInt16 ev = samples [ s * 2 ];
SInt16 od = samples [ s * 2 + 1 ];
// Convert to float
r [ s ] = ( float ) ev;
i [ s ] = ( float ) od;
if ( amp < ev )
{
amp = ev;
}
if ( amp < od )
{
amp = od;
}
}
// Only proceed if we have a big enough amplitute
if ( amp > THRESHOLD )
{
FFTSetup fft = vDSP_create_fftsetup ( 10, kFFTRadix2 );
if ( fft )
{
// FFT!
vDSP_fft_zrip ( fft, & c, 1, 10, FFT_FORWARD );
// Get frequency
int maxS = 0;
float maxF = 0;
for ( int s = 1; s < 512; s ++ )
{
float f = r [ s ] * r [ s ] + i [ s ] * i [ s ];
if ( f > maxF )
{
maxF = f;
maxS = s;
}
}
// Dealloc
vDSP_destroy_fftsetup ( fft );
// Done
dispatch_async ( dispatch_get_main_queue (), ^ {
[self measure:maxS
n:1024];
} );
}
}
}
@end
Why does this work well on a device but refuses on a simulator?
Then, secondary question, since I did give all the detail here, any ideas on how to improve the accuracy or will that only be accomplished by using more frequency triggers?
TIA