2

Whenever I find an abbreviation within a sentence (like Mr., prf. and so on) I would like to delete the '\n' character at the end of each sentence that contains an abbreviated word. Any ideas are welcome.INPUT OUTPUT

My idea so far is:

List<String> pres = Arrays.asList("dl","Dl", "Prf", "Ing");
for(int i=1;i<4;i++){
    if (z.contains(pres.get(i)))
        f=z.indexOf(pres.get(i));
    z.replaceFirst("\\n"," ");//how i can use my f here to get rid of next new line...?
}

2 Answers2

0

Here is an approximate solution, without knowing the full list of abbreviations which you want to check. You may search on the following pattern, and replace with the first capture group:

((?:Mr|Mrs|Dr)\.[^.]+\.)\n

This will identify the last abbreviation in any sentence which ends in dot immediately followed by a \n newline. Note that in cases with more than one abbrevation in a single sentence, it would only match the last abbreviation.

String input = "Here is a sentence.  Said Mrs. Canopoy, here is another sentence about Mr. Potato Head.\r\nHere is a third sentence.";
System.out.println(input);
input = input.replaceAll("((?:Mr|Mrs|Dr)\\.[^.]+\\.)\\r\\n", "$1");
System.out.println(input);

Demo

I only check for Mr., Mrs., or Dr., but you may add as many abbreviations as you want to the alternation.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • You'll need to modify the pattern according to your usage. Try this: `input = input.replaceAll("(.*(?:Mr|Ing|dl|Dl|Prf)\\..*)\\n", "$1");` – Timir May 27 '18 at 15:21
  • Having said that, also take a look at [this discussion](https://stackoverflow.com/questions/6262397/string-replaceall-is-considerably-slower-than-doing-the-job-yourself) for why you should avoid `String.replaceAll`. – Timir May 27 '18 at 15:22
  • @Timir I don't see the point of your lookahead, since you're still using a capture group. And I think the OP is not so much concerned with performance here as correctness. But you're right; we could use a formal pattern matcher to get a performance boost. – Tim Biegeleisen May 27 '18 at 15:29
  • dosen`t work..return the same thing like the input. it is any chance that the code can't find the \n? – Bogdan Rogojan May 27 '18 at 15:33
  • @BogdanRogojan _Edit_ your question and show sample input data and the expected output. "doesn't work" doesn't let me help you. I can't code a solution to data I can't see. – Tim Biegeleisen May 27 '18 at 15:34
  • after . it is a space that i saw just now and just after that it is the \n i added at code after \\. the \\s but still not work. That is not a problem of abbreviations because i know what key words i need to use for my code, it is a problem to find end of the row – Bogdan Rogojan May 27 '18 at 15:56
  • @BogdanRogojan The newline on Windows is `\r\n`, just `\n` is for Linux. Try searching for this. Obviously if you expect something between the end of the sentence and the newline, then you'll have to modify my pattern. – Tim Biegeleisen May 27 '18 at 16:01
  • @Tim Biegeleisen It is working now that is the regex("((?:Mr|Mrs|Dr)\\.+\\s)\\r\\n", "$1");. Thx a lot :) – Bogdan Rogojan May 27 '18 at 16:38
0

Just use this:

String s = "Mike and Mr.\nDave take dinner.\nThat is very important.\nMe and Ing.\nMike bla bla..";
s = s.replaceAll("(Mr.|Ing.)\n", "$1 ");
Neb
  • 2,270
  • 1
  • 12
  • 22