I'm trying to use the command --token-regex '[\p{L}\p{M}]+',
with the usual commands for importing text, so that mallet can read german text. No error-message is shown and a new file created. It is suspiciously small however.
Then, using train-topics
to run a topic-model, the following error message is shown:
3 5
4 5
5 5
6 5
7 5
8 5
9 5
Infinite value after topic 0 0
<350> LL/token: ´┐¢
Infinite value after topic 0 0
<360> LL/token: ´┐¢
Infinite value after topic 0 0
<370> LL/token: ´┐¢
Infinite value after topic 0 0
<380> LL/token: ´┐¢
Infinite value after topic 0 0
<390> LL/token: ´┐¢
I've been trying to fix this for hours using different token regex commands but nothing seems to work, any help would be greatly apreciated.