10

I'm looking for a pythonic interface to load ARPA files (back-off language models) and use them to evaluate some text, e.g. get its log-probability, perplexity etc.

I don't need to generate the ARPA file in Python, only to use it for querying.

Does anybody have a recommended package? I already saw kenlm and swig-srilm, but the first is very hard to set up in Windows and the second seems un-maintained anymore.

Stefanus
  • 1,619
  • 3
  • 12
  • 23
Beka
  • 725
  • 6
  • 22

2 Answers2

4

I found a nice under-development package called pynlpl which does exactly what i need, with very few dependencies (libxml2 is about enough), and it gives a pure pythonic implementation to ARPA files

Beka
  • 725
  • 6
  • 22
2

What about the ARPA package?

It's rather lightweight. Its APIs are also quite intuitive and easy to learn. Although it's not as fast as kenlm, you may still wanna give it a try.

https://pypi.org/project/arpa/

Magz
  • 46
  • 5