I am writing code to detect bad keywords in a file. Here are the steps that I follow:
- Tokenize using StreamTokenizer
Use pattern matcher to find the matches
while(streamTokenizer.nextToken() != StreamTokenizer.TT_EOF){ if(streamTokenizer.ttype == StreamTokenizer.TT_WORD) { String token = streamTokenizer.sval.trim().replaceAll("\\\\n", "") final Matcher matcher = badKeywordPattern.matcher(token) if(matcher.find()) { // bad tokens found return true; } } }
String token = streamTokenizer.sval.trim().replaceAll("\\\\n", "")
is done to match token spanning multiple lines with \
. Example:
bad\
token
However the replace is not working. Any suggestions? Any other ways to do this?