I found this speech recognition code that I downloaded from a blog. It works fine, it asks to record sounds to create a dataset and then you have to call a function to train the system using neural networks.
I want to use this code to train using my dataset of 20 words that I want to recognise.
Problem: I have a dataset of 800 files for twenty words i.e. 40 recordings from different people for each word. I used Windows sound recorder to collect the files. The problem is that in the code is that the size of the input file is set to ALWAYS be 8000, my dataset on the other hand is not constant, some files are 2 seconds long, some are 3 that means there'll be different number of samples in each file.
If the samples per input signal variate it'll probably generate errors. I want to use my files to train the system. How do I do that?
Code:
clc;clear all;
load('voicetrainfinal.mat');
Fs=8000;
for l=1:20
clear y1 y2 y3;
display('record voice');
pause();
x=wavrecord(Fs,Fs); % wavrecord(n,Fs) records n samples at a sampling rate of Fs
maxval = max(x);
if maxval<0.04
display('Threshold value is too large!');
end
t=0.04;
j=1;
for i=1:8000
if(abs(x(i))>t)
y1(j)=x(i);
j=j+1;
end
end
y2=y1/(max(abs(y1)));
y3=[y2,zeros(1,3120-length(y2))];
y=filter([1 -0.9],1,y3');%high pass filter to boost the high frequency components
%%frame blocking
blocklen=240;%30ms block
overlap=80;
block(1,:)=y(1:240);
for i=1:18
block(i+1,:)=y(i*160:(i*160+blocklen-1));
end
w=hamming(blocklen);
for i=1:19
a=xcorr((block(i,:).*w'),12);%finding auto correlation from lag -12 to 12
for j=1:12
auto(j,:)=fliplr(a(j+1:j+12));%forming autocorrelation matrix from lag 0 to 11
end
z=fliplr(a(1:12));%forming a column matrix of autocorrelations for lags 1 to 12
alpha=pinv(auto)*z';
lpc(:,i)=alpha;
end
wavplay(x,Fs);
X1=reshape(lpc,1,228);
a1=sigmoid(Theta1*[1;X1']);
h=sigmoid(Theta2*[1;a1]);
m=max(h);
p1=find(h==m);
if(p1==10)
P=0
else
P=p1
end
end