infinity value error after token regex command

Question

I'm trying to use the command --token-regex '[\p{L}\p{M}]+', with the usual commands for importing text, so that mallet can read german text. No error-message is shown and a new file created. It is suspiciously small however. Then, using train-topics to run a topic-model, the following error message is shown:

3       5
4       5
5       5
6       5
7       5
8       5
9       5
Infinite value after topic 0 0
<350> LL/token: ´┐¢
Infinite value after topic 0 0
<360> LL/token: ´┐¢
Infinite value after topic 0 0
<370> LL/token: ´┐¢
Infinite value after topic 0 0
<380> LL/token: ´┐¢
Infinite value after topic 0 0
<390> LL/token: ´┐¢

I've been trying to fix this for hours using different token regex commands but nothing seems to work, any help would be greatly apreciated.

I ran into the same problem on Windows when I tried Gensim's wrapper for Mallet. (it didn't appear to be related to regex commands). Switching to Linux fixed it for me. — MrFancypants, Dec 06 '14 at 15:49

score -2 · Answer 1 · edited May 23 '17 at 11:56

-2

If you are using Windows, try something like:

--token-regex "[\p{L}\p{M}]+"

UPD: you can find the discussion on "single vs double quotes in cmd.exe" here: What does single quote do in windows batch files?

edited May 23 '17 at 11:56

Community

1
1

answered Mar 26 '15 at 08:47

user1520759

1
2

2

even though people often do just want the answer, it is preferable if you provide an explanation. – thecoshman Mar 26 '15 at 09:02
OK, thanks for the useful suggestion. However, minuses for my very first attempt to help people on stackoverflow look very discouraging. – user1520759 Mar 26 '15 at 09:17

infinity value error after token regex command

1 Answers1