1

I'm trying to use the command --token-regex '[\p{L}\p{M}]+', with the usual commands for importing text, so that mallet can read german text. No error-message is shown and a new file created. It is suspiciously small however. Then, using train-topics to run a topic-model, the following error message is shown:

3       5
4       5
5       5
6       5
7       5
8       5
9       5
Infinite value after topic 0 0
<350> LL/token: ´┐¢
Infinite value after topic 0 0
<360> LL/token: ´┐¢
Infinite value after topic 0 0
<370> LL/token: ´┐¢
Infinite value after topic 0 0
<380> LL/token: ´┐¢
Infinite value after topic 0 0
<390> LL/token: ´┐¢

I've been trying to fix this for hours using different token regex commands but nothing seems to work, any help would be greatly apreciated.

Alexander Karmes
  • 2,438
  • 1
  • 28
  • 34
blub123
  • 31
  • 5
  • I ran into the same problem on Windows when I tried Gensim's wrapper for Mallet. (it didn't appear to be related to regex commands). Switching to Linux fixed it for me. – MrFancypants Dec 06 '14 at 15:49

1 Answers1

-2

If you are using Windows, try something like:

--token-regex "[\p{L}\p{M}]+"

UPD: you can find the discussion on "single vs double quotes in cmd.exe" here: What does single quote do in windows batch files?

Community
  • 1
  • 1
  • 2
    even though people often do just want the answer, it is preferable if you provide an explanation. – thecoshman Mar 26 '15 at 09:02
  • OK, thanks for the useful suggestion. However, minuses for my very first attempt to help people on stackoverflow look very discouraging. – user1520759 Mar 26 '15 at 09:17