I am observing that coreNLP 3.9.2 has started splitting enti_ties into multiple ones like 'enti' , '_', 'ties' while tokenizing
I have tried to use the tokenize.whitespace which solves this problem. But I think this will stop splitting tokens for "cant't" and "dont't"