I have database with paragraphs of books and database of phrases from which I try to generate HTML file. For simplicity let's say I want to bold some phrases.
I have following sentence:
Following that mission I was sent on further missions. Most of them succeeded and we are living with the consequences today. They must remain secret. If the truth ever came out, then some of the national leaders at the time would find themselves in the Hague on war crimes charges.
I want to make bold mission word - that's simple. I use for pattern only mission word. But sometimes phrases' words are not next to each other. In example sentence it is 'on charges' So I figured out that I can put in database wildcard character which I will replace later in program. So in database it is 'on % charges' Additionally I don't want to words that are not phrase part to be bolded.
So the output should be
<b>on</b> war crime <b>charges</b>
My Java code to do this is as following (w.getWord() returns 'on % charges'
String pat = w.getWord().replace("%", ".+?").trim();
Pattern p = Pattern.compile("(\\W)(" + pat + "){1}?(\\W)");
text = text.replaceFirst(p.pattern(), "$1<b>$2</b>$3");
And now in that sentence the problem came up, that if there is another on before in sentence it gets bolded to much and bolded output is:
Following that mission I was sent <b>on further missions. Most of them succeeded and we are living with the consequences today. They must remain secret. If the truth ever came out, then some of the national leaders at the time would find themselves in the Hague on war crimes charges</b>