I am trying to preprocess text before parsing them to StanfordCoreNLP server. Some of my text looks like this.
" Conversion of code written in C# to Visual Basic .NET (VB.NET)."
The ".NET" confuses the server because it appears as a period and makes the single sentence into two. I wanted to replace '.' that appears in front of a word with 'DOT' so that sentence remains the same. Note that I don't want to change anything in 'VB.NET' because the StanfordCoreNLP recognizes that as one word (Proper noun).
This is what I tried so far.
print(re.sub(r"\.(\S+)", r"DOT\g<0>", text))
The result looks like this.
Conversion of code written in C# to Visual Basic DOT.NET (VBDOT.NET).
I tried adding word boundaries to the pattern r"\b\.(\S+)\b"
. It didn't work.
Any help would be appreciated.