Questions tagged [kaldi]

Kaldi speech recognition toolkit

Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

113 questions
8
votes
1 answer

Which tool can I trust?

I seem to have to problems determining which tool I can trust... The tools i've been testing is Librosa and Kaldi in creating dataset for plots visualizations of 40 filterbank energies of an audio file. The filterbank energies are extracted using…
I am not Fat
  • 283
  • 11
  • 36
5
votes
2 answers

Is it possible to install Kaldi on Google Colab

I want to use Google Colab in a research project using Kaldi ASR. Is it possible to install it? and Where Can I find Kaldi files after installation?
Maher
  • 53
  • 1
  • 4
4
votes
2 answers

Building Kaldi on a Mac running macOS Catalina

I’m using a Mac Pro running macOS Catalina (v10.15.1). Has anyone managed to build Kaldi with this version of the OS? Specifically, one of Kaldi's dependencies is the Intel Math Kernel Library (MKL or some other suitable matrix algebra library).…
Mark
  • 43
  • 2
4
votes
1 answer

Why is the plot in librosa different?

I am currently trying using librosa to perform stfft, such that the parameter resembles a stfft process from a different framework (Kaldi). The audio file is fash-b-an251 Kaldi does it using a sample frequency of 16 KHz, window_size = 400 (25ms),…
I am not Fat
  • 283
  • 11
  • 36
3
votes
2 answers

kaldi python2 binary issue

I am installing kaldi in ubuntu 18.04. python2.7 is one of the dependencies to install kaldi. I have installed python2.7 by sudo apt-get install pytho2.7. Then to check the prerequisites run extras/check_dependencies.sh. The result showing -…
3
votes
1 answer

Does Kaldi return any recognition confidence parameter, similar to Google Speech-To-Text API?

I am dealing with a speech recognition task. So far, I have been using the Google Cloud Speech Recognition API (in Python) with good results. The API returns a confidence value along with every chunk of the transcribed text. The confidence is a…
3
votes
2 answers

French language support in Kaldi

I am working on Kaldi but there is not info on its webpage about the language which it supports for conversion. Can I use Kaldi for French speech to text conversion. I need to develop an offline French learning app. I tried PocketSphinx but the…
Sumit Vaise
  • 138
  • 1
  • 10
3
votes
0 answers

unable to send message to qmaster using port 6444 on host "*******": got send error. Exiting

error: commlib error: got select error (Connection refused) Unable to run job: unable to send message to qmaster using port 6444 on host "naveen": got send error. Exiting. queue.pl: It looks like the queue master may be inaccessible. Trying again…
Nithin Reddy
  • 57
  • 12
3
votes
1 answer

Spectrograms generated using Librosa don't look consistent with Kaldi?

I generated spectrogram of a "seven" utterance using the "egs/tidigits" code from Kaldi, using 23 bins, 20kHz sampling rate, 25ms window, and 10ms shift. Spectrogram appears as below visualized via MATLAB imagesc function: I am experimenting with…
kashkar
  • 663
  • 1
  • 8
  • 22
3
votes
0 answers

How do i extract the posterior probability of the hmm?

I just extracted a alignment from my model at a frame level. fash-b-an251 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 134 134 134 134 134 134 134 134 134 44 44 44 44 44 44 44 44 44 111 111 111 111 111 111 111 111 111 111 1 1 1…
I am not Fat
  • 283
  • 11
  • 36
3
votes
1 answer

Understanding audio file spectrogram values

I am currently struggling to understand how the power spectrum is stored in the kaldi framework. I seem to have successfully created some data files using $cmd JOB=1:$nj $logdir/spect_${name}.JOB.log \ compute-spectrogram-feats --verbose=2 \ …
I am not Fat
  • 283
  • 11
  • 36
2
votes
0 answers

How to create a high quality wake-word solution for Android/ios app. Which technology stacks to try?

I am trying using: Tensor flow lite- unable to achieve desired accuracy & training sample size requirement seems to be very high. Q1) What's the minimum sample size required? Kaldi - size send to be around 30MB, prohibitively high. Q2) What are…
2
votes
1 answer

Does "precision" of audio files have importance during training ASR systems?

I am resampling audio files with 8 kHz into 16 kHz by torchaudio. An example of an original file: Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s After resampling it's become: Stream #0:0: Audio: pcm_f32le…
Yehor Smoliakov
  • 326
  • 3
  • 13
2
votes
1 answer

Cannot comiple .c because it cannot find .h file

I'm on ubuntu 18.04 and i'm trying to compile a .c file that came with an API that i'm working which is called vosk. The issue is that the python code works without any problems but if i try to gcc test_vosk.c -o test_vosk the .c file that they…
Birto
  • 71
  • 5
2
votes
1 answer

Vosk (Kaldi) offline speech recognition in Unity

How to implement and use Vosk library into Unity project? Please write steps 1,2,3... Vosk library here - https://github.com/alphacep/vosk-api
1
2 3 4 5 6 7 8