python human authentication by comparing speech (voice)

Question

I'm creating a human authentication system using human speech.

So the system will have one directory which will hold a human speech and it will be compared with the current speech.

after comparison, it should be able to recognise the person. I don't know whether it is possible or not.

Currently, I am able to do the following things :

Save the audio file from the microphone.
Speech-to-text conversion.
Get the audio shape, duration and data-type.
Make a graph of the audio file.

code:

import speech_recognition as sr
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile

r = sr.Recognizer()
with sr.Microphone() as source:                
    audio = r.listen(source) 
    with open("./a_1.wav","wb") as f:
        f.write(audio.get_wav_data())

try:
    a = r.recognize_google(audio)
    print(a)    
except LookupError:                            
    print("Could not understand audio")

frequency_sampling, audio_signal = wavfile.read("./a_1.wav") 
print('Signal shape:', audio_signal.shape)
print('Signal Datatype:', audio_signal.dtype)
print('Signal duration:', round(audio_signal.shape[0] / 
float(frequency_sampling), 2), 'seconds')

audio_signal = audio_signal / np.power(2, 15)

length_signal = len(audio_signal)
half_length = np.ceil((length_signal + 1) / 2.0).astype(np.int)

signal_frequency = np.fft.fft(audio_signal)

signal_frequency = abs(signal_frequency[0:half_length]) / length_signal
signal_frequency **= 2

len_fts = len(signal_frequency)

if length_signal % 2:
   signal_frequency[1:len_fts] *= 2
else:
   signal_frequency[1:len_fts-1] *= 2

signal_power = 10 * np.log10(signal_frequency)

x_axis = np.arange(0, half_length, 1) * (frequency_sampling / length_signal) / 1000.0

plt.figure()
plt.plot(x_axis, signal_power, color='black')
plt.xlabel('Frequency (kHz)')
plt.ylabel('Signal power (dB)')
plt.show()

plt.plot(time_axis, audio_signal, color='blue')
plt.xlabel('Time (milliseconds)')
plt.ylabel('Amplitude')
plt.title('Input audio signal')
plt.show()

For speech comparison I tried :

import audiodiff
print audiodiff.audio_equal('a_1.wav', 'a_2.wav', ffmpeg_bin=None)

# false

A_1 and A_2 both had the same audio content. It returns false
Matplotlib graph for both the audio was different.

can anyone help me human speech authentication?

This seem really non trivial and unlikely to be doable without machine learning. — Pierre.Sassoulas, Feb 12 '19 at 10:38
Why this is on hold till now ????? Do i need to improve my question ??? — Ujwala Patil, Feb 14 '19 at 11:50
The question is very broad and probably unanswerable as it is. You could study speech recognition (here for example : https://realpython.com/python-speech-recognition/) and come back when you have a specific problem. Or you could focus on the specific problem you have (two identic sound file should return True). — Pierre.Sassoulas, Feb 14 '19 at 12:44
I think is possible,**matplotlib graph** created for every person voice is different. I just want to know how to differentiate each graph. — Ujwala Patil, Feb 20 '19 at 05:34
@UjwalaPatil when users decide to close a question, they NEVER look back for REOPEN questions. its sad but this is how user works here — L F, May 05 '20 at 14:37

python human authentication by comparing speech (voice)

0 Answers0