I want to write a regular expression to remove all tokens of a text file that do not have at least one letter. I used OpenNLP tokenizer for extracting tokens of my text file.For instance, tokens 90-87, 65@7, ---, 8/0, ? are removed from given text.
I tried to follow these pages 1 ,2 and 3; but I could not find the expression that I want. For example, the following code remove token anti-age, mid-november.
String[] tokens = t.getTokens(sen);
for (String word : tokens)
if((!isstopWord(word)) && word.matches("[a-zA-Z]+"))
bufferedw.append(word+"\n");
But, I do not know how to prevent removing tokens like anti-age.
where is the problem?