I have this large text (read LARGE). I need to tokenize every word, delimit on every non-letter. I used StringTokenizer to read one word at a time. However, as I was researching how to write the delimiter string ("every non-letter") instead of doing something like:
new StringTokenizer(text, "\" ();,.'[]{}!?:”“…\n\r0123456789 [etc etc]");
I found that everyone basically hates StringTokenizer (why?).
So, what can I use instead? Dont suggest String.split as it will duplicate my large text. I need to go through the text word by word and delimit on every non-letter. Is it easier to build something on my own or is there some best practice way to confront this problem?
Thanks in advance!