4

I'm creating a human authentication system using human speech.

So the system will have one directory which will hold a human speech and it will be compared with the current speech.

after comparison, it should be able to recognise the person. I don't know whether it is possible or not.

Currently, I am able to do the following things :

  1. Save the audio file from the microphone.
  2. Speech-to-text conversion.
  3. Get the audio shape, duration and data-type.
  4. Make a graph of the audio file.

code:

import speech_recognition as sr
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile

r = sr.Recognizer()
with sr.Microphone() as source:                
    audio = r.listen(source) 
    with open("./a_1.wav","wb") as f:
        f.write(audio.get_wav_data())

try:
    a = r.recognize_google(audio)
    print(a)    
except LookupError:                            
    print("Could not understand audio")

frequency_sampling, audio_signal = wavfile.read("./a_1.wav") 
print('Signal shape:', audio_signal.shape)
print('Signal Datatype:', audio_signal.dtype)
print('Signal duration:', round(audio_signal.shape[0] / 
float(frequency_sampling), 2), 'seconds')

audio_signal = audio_signal / np.power(2, 15)

length_signal = len(audio_signal)
half_length = np.ceil((length_signal + 1) / 2.0).astype(np.int)

signal_frequency = np.fft.fft(audio_signal)

signal_frequency = abs(signal_frequency[0:half_length]) / length_signal
signal_frequency **= 2

len_fts = len(signal_frequency)

if length_signal % 2:
   signal_frequency[1:len_fts] *= 2
else:
   signal_frequency[1:len_fts-1] *= 2

signal_power = 10 * np.log10(signal_frequency)

x_axis = np.arange(0, half_length, 1) * (frequency_sampling / length_signal) / 1000.0

plt.figure()
plt.plot(x_axis, signal_power, color='black')
plt.xlabel('Frequency (kHz)')
plt.ylabel('Signal power (dB)')
plt.show()

plt.plot(time_axis, audio_signal, color='blue')
plt.xlabel('Time (milliseconds)')
plt.ylabel('Amplitude')
plt.title('Input audio signal')
plt.show()

For speech comparison I tried :

import audiodiff
print audiodiff.audio_equal('a_1.wav', 'a_2.wav', ffmpeg_bin=None)

# false
  • A_1 and A_2 both had the same audio content. It returns false
  • Matplotlib graph for both the audio was different.

can anyone help me human speech authentication?

SilentFlame
  • 487
  • 5
  • 15
Ujwala Patil
  • 180
  • 14
  • 2
    This seem really non trivial and unlikely to be doable without machine learning. – Pierre.Sassoulas Feb 12 '19 at 10:38
  • Why this is on hold till now ????? Do i need to improve my question ??? – Ujwala Patil Feb 14 '19 at 11:50
  • The question is very broad and probably unanswerable as it is. You could study speech recognition (here for example : https://realpython.com/python-speech-recognition/) and come back when you have a specific problem. Or you could focus on the specific problem you have (two identic sound file should return True). – Pierre.Sassoulas Feb 14 '19 at 12:44
  • I think is possible,**matplotlib graph** created for every person voice is different. I just want to know how to differentiate each graph. – Ujwala Patil Feb 20 '19 at 05:34
  • @UjwalaPatil when users decide to close a question, they NEVER look back for REOPEN questions. its sad but this is how user works here – L F May 05 '20 at 14:37

0 Answers0