Questions tagged [kenlm]

KenLM is a fast and low-memory language modeling toolkit that scales to trillions of words.

23 questions
6
votes
1 answer

Cannot install kenlm package in anaconda environment

When trying to install the python wrapper of kenlm from pip within an anaconda environment, I get the error: (lm_1b) adamg:lm_1b adamg$ pip install https://github.com/kpu/kenlm/archive/master.zip Collecting…
Adam_G
  • 7,337
  • 20
  • 86
  • 148
5
votes
4 answers

How to compute perplexity using KenLM?

Let's say we build a model on this: $ wget https://gist.githubusercontent.com/alvations/1c1b388456dc3760ffb487ce950712ac/raw/86cdf7de279a2b9bceeb3adb481e42691d12fbba/something.txt $ lmplz -o 5 < something.txt > something.arpa From the perplexity…
alvas
  • 115,346
  • 109
  • 446
  • 738
4
votes
1 answer

How to relate the language model score of a whole sentence to those of the sentence's constituents

I trained a KENLM language model on around 5000 English sentences/paragraphs. I want to query this ARPA model with two or more segments and see if they can be concatenated to form a longer sentence, hopefully more "grammatical." Here as follows is…
Wei JIANG
  • 71
  • 4
2
votes
0 answers

TensorFlow and KenLM

How does one use KenLM with tensorflow as decoder? I know about tensorflow-with-kenlm tf fork, but it is based on 1.1 tf version which doesn't have many important features for my project.
Andrii Tytarenko
  • 121
  • 1
  • 10
2
votes
0 answers

When loading KenLM language model for scoring sentences should the LM file size be less than RAM size?

When loading language model for scoring sentence should the LM('bible.klm') filesize be less than RAM size? import kenlm model = kenlm.LanguageModel('bible.klm') model.score('in the beginning was the word')
Arshiyan Alam
  • 335
  • 1
  • 11
1
vote
1 answer

Cannot allocate memory Failed to allocate when using KenLM build_binary

I have a arpa file which I created by the following command: ./lmplz -o 4 -S 1G 100m.arpa Now I want to convert this arpa file to binary file: ./build_binary 100m.arpa 100m.bin And I'm getting error: mmap.cc:225 in void…
user3668129
  • 4,318
  • 6
  • 45
  • 87
1
vote
0 answers

Toolchain.cmake to cross-compile kenlm for Android

I try to make kenlm binaries usable on Android. Kenlm is written in c++ and uses cmake, so I tried to do a toolchain file to crosscompile with cmake. My toolchain file looks like that : set(CMAKE_SYSTEM_NAME Android) set(CMAKE_SYSTEM_VERSION…
1
vote
0 answers

Different probabilities between kenlm and berkeleylm

I build ngram language model using kenlm and berkeleylm, but they give very different probability to token . The kenlm gives: ngram 1=164482 ngram 2=4355352 ngram 3=15629476 \1-grams: -6.701107 0 0 -1.9270477 -1.8337007
K_Augus
  • 372
  • 2
  • 14
1
vote
1 answer

Installing Python package from source using Microsoft Visual Build Tools 2017

I have a python package that is failing to install due a dependency on Windows build tools. Things I have tried: Install latest version of Visual Studio 2017 (AFAIK it should contain Microsoft Visual C++ 14.0). Install Build Tools for Visual Studio…
Sledge
  • 1,245
  • 1
  • 23
  • 47
1
vote
2 answers

Python: KenLM installation error

I am installing KenLM on Python2.7 on Windows 7 64 bit with the following command: pip install https://github.com/kpu/kenlm/archive/master.zip Error message: C:\Python27\Scripts>pip install https://github.com/kpu/kenlm/archive/master.zip …
Phyu Khaing
  • 11
  • 1
  • 2
1
vote
1 answer

Issue with Tensorflow Kenlm

How to install tensorflow with kenlm ? Apparently while using tensorflows ctc beam search decoder there is no argument for kenlm. How can we integrate kenlm within that function ?
Appu
  • 83
  • 9
0
votes
0 answers

How to train KenLM language model for Nvidia's QuartzNet?

I am trying to train a speech-to-text model for the Armenian language. After I am using the Nvidia NeMo toolkit. After training the acoustic model I used provided NeMo/scripts/asr_language_modeling/ngram_lm/train_kenlm.py file to train the language…
arm
  • 56
  • 1
  • 12
0
votes
0 answers

Why do I need to add --discount_fallback?

I have simple English file: I'm Harry Potter Harry Potter is young wizard Hermione Granger is Harry friend There are seven fantasy novels of Harry Potter I'm running the following command: lmplz -o 3 myTest.arpa And getting…
user3668129
  • 4,318
  • 6
  • 45
  • 87
0
votes
0 answers

​ Getting Segmentation fault when running lmplz (KenLM)

I'm following this article: https://huggingface.co/blog/wav2vec2-with-ngram and I'm running the following command: kenlm/build/bin/lmplz -o 5 <"text.txt" > "5gram.arpa" And I'm getting…
user3668129
  • 4,318
  • 6
  • 45
  • 87
0
votes
1 answer

Kenlm lmplz on Google Colab

I used Kenlm to train a language model on Google Colab. This is what i have in bin folder: %cd /content/drive/My Drive/kenlm/build/bin !ls /content/drive/My Drive/kenlm/build/bin build_binary 'lm (1).en.arpa' phrase_table_vocab …
1
2