I have simple English file:
I'm Harry Potter
Harry Potter is young wizard
Hermione Granger is Harry friend
There are seven fantasy novels of Harry Potter
I'm running the following command:
lmplz -o 3 <myTest.txt >myTest.arpa
And getting error:
/adjust_counts.cc:60 in void lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const lm::builder::DiscountConfig&) threw BadDiscountException because `discounts_[i].amount[j] < 0.0 || discounts_[i].amount[j] > j'.
ERROR: 1-gram discount out of range for adjusted count 2: -0.5999999. This means modified Kneser-Ney smoothing thinks something is weird about your data. To override this error for e.g. a class-based model, rerun with --discount_fallback
If I run it with --discount_fallback
parameter - it works.
- What is wrong with my text file ?
- What is the meaning of adding --discount_fallback parameter ?``