KenLM is a fast and low-memory language modeling toolkit that scales to trillions of words.
Questions tagged [kenlm]
23 questions
6
votes
1 answer
Cannot install kenlm package in anaconda environment
When trying to install the python wrapper of kenlm from pip within an anaconda environment, I get the error:
(lm_1b) adamg:lm_1b adamg$ pip install https://github.com/kpu/kenlm/archive/master.zip
Collecting…

Adam_G
- 7,337
- 20
- 86
- 148
5
votes
4 answers
How to compute perplexity using KenLM?
Let's say we build a model on this:
$ wget https://gist.githubusercontent.com/alvations/1c1b388456dc3760ffb487ce950712ac/raw/86cdf7de279a2b9bceeb3adb481e42691d12fbba/something.txt
$ lmplz -o 5 < something.txt > something.arpa
From the perplexity…

alvas
- 115,346
- 109
- 446
- 738
4
votes
1 answer
How to relate the language model score of a whole sentence to those of the sentence's constituents
I trained a KENLM language model on around 5000 English sentences/paragraphs. I want to query this ARPA model with two or more segments and see if they can be concatenated to form a longer sentence, hopefully more "grammatical." Here as follows is…

Wei JIANG
- 71
- 4
2
votes
0 answers
TensorFlow and KenLM
How does one use KenLM with tensorflow as decoder?
I know about tensorflow-with-kenlm tf fork, but it is based on 1.1 tf version which doesn't have many important features for my project.

Andrii Tytarenko
- 121
- 1
- 10
2
votes
0 answers
When loading KenLM language model for scoring sentences should the LM file size be less than RAM size?
When loading language model for scoring sentence should the LM('bible.klm') filesize be less than RAM size?
import kenlm
model = kenlm.LanguageModel('bible.klm')
model.score('in the beginning was the word')

Arshiyan Alam
- 335
- 1
- 11
1
vote
1 answer
Cannot allocate memory Failed to allocate when using KenLM build_binary
I have a arpa file which I created by the following command:
./lmplz -o 4 -S 1G 100m.arpa
Now I want to convert this arpa file to binary file:
./build_binary 100m.arpa 100m.bin
And I'm getting error:
mmap.cc:225 in void…

user3668129
- 4,318
- 6
- 45
- 87
1
vote
0 answers
Toolchain.cmake to cross-compile kenlm for Android
I try to make kenlm binaries usable on Android. Kenlm is written in c++ and uses cmake, so I tried to do a toolchain file to crosscompile with cmake.
My toolchain file looks like that :
set(CMAKE_SYSTEM_NAME Android)
set(CMAKE_SYSTEM_VERSION…

user13583939
- 63
- 7
1
vote
0 answers
Different probabilities between kenlm and berkeleylm
I build ngram language model using kenlm and berkeleylm, but they give very different probability to token .
The kenlm gives:
ngram 1=164482
ngram 2=4355352
ngram 3=15629476
\1-grams:
-6.701107 0
0 -1.9270477
-1.8337007 …

K_Augus
- 372
- 2
- 14
1
vote
1 answer
Installing Python package from source using Microsoft Visual Build Tools 2017
I have a python package that is failing to install due a dependency on Windows build tools.
Things I have tried:
Install latest version of Visual Studio 2017 (AFAIK it should contain Microsoft Visual C++ 14.0).
Install Build Tools for Visual Studio…

Sledge
- 1,245
- 1
- 23
- 47
1
vote
2 answers
Python: KenLM installation error
I am installing KenLM on Python2.7 on Windows 7 64 bit with the following command:
pip install https://github.com/kpu/kenlm/archive/master.zip
Error message:
C:\Python27\Scripts>pip install https://github.com/kpu/kenlm/archive/master.zip
…

Phyu Khaing
- 11
- 1
- 2
1
vote
1 answer
Issue with Tensorflow Kenlm
How to install tensorflow with kenlm ?
Apparently while using tensorflows ctc beam search decoder there is no argument for kenlm. How can we integrate kenlm within that function ?

Appu
- 83
- 9
0
votes
0 answers
How to train KenLM language model for Nvidia's QuartzNet?
I am trying to train a speech-to-text model for the Armenian language. After I am using the Nvidia NeMo toolkit. After training the acoustic model I used provided NeMo/scripts/asr_language_modeling/ngram_lm/train_kenlm.py file to train the language…

arm
- 56
- 1
- 12
0
votes
0 answers
Why do I need to add --discount_fallback?
I have simple English file:
I'm Harry Potter
Harry Potter is young wizard
Hermione Granger is Harry friend
There are seven fantasy novels of Harry Potter
I'm running the following command:
lmplz -o 3 myTest.arpa
And getting…

user3668129
- 4,318
- 6
- 45
- 87
0
votes
0 answers
Getting Segmentation fault when running lmplz (KenLM)
I'm following this article:
https://huggingface.co/blog/wav2vec2-with-ngram
and I'm running the following command:
kenlm/build/bin/lmplz -o 5 <"text.txt" > "5gram.arpa"
And I'm getting…

user3668129
- 4,318
- 6
- 45
- 87
0
votes
1 answer
Kenlm lmplz on Google Colab
I used Kenlm to train a language model on Google Colab.
This is what i have in bin folder:
%cd /content/drive/My Drive/kenlm/build/bin
!ls
/content/drive/My Drive/kenlm/build/bin
build_binary 'lm (1).en.arpa' phrase_table_vocab …

Renae Nguyen
- 9
- 1