Java replace regex before and after a period

Question

I am working with XML on an android app that sometimes leaves sentences bumped up against each other.

Like: First sentence.Another sentence

I know I need to use [a-z] (lowercase letters), [A-Z] (uppercase letters), and all digits ([0-9]?) to search before and after the period, and then add a space after the period.

Maybe something like:

myString = myString.replaceAll("(\\p{Ll})(\\p{Lu})", "$1 $2");

My searches and efforts have been useless so far, so any and all help is welcomed. Thanks

Couldn't you come up with a better title than `I can not find this regex`? — devnull, Feb 24 '14 at 07:29
Your title sounds like you've [lost your regex, and you need help finding it](https://xkcd.com/1313/). — user2357112, Feb 24 '14 at 07:31
Never parse XML with regex.XML is not a regular language.Use well known XML parsers instead.See this question : http://stackoverflow.com/questions/8577060/why-is-it-such-a-bad-idea-to-parse-xml-with-regex — Madusudanan, Feb 24 '14 at 07:33
at the time of me making edits to XML, it is already a well formatted string — Dustin, Feb 24 '14 at 07:34
At what point are these sentences stuck together without a space? Does the XML itself have sentences joined improperly, with no spaces or tags between them? — user2357112, Feb 24 '14 at 07:35
I have no idea where the problem occurs at. I am editing a string obtained through a RSS XML feed that mainly provides info on the web, but for some reason when I collect it to android, it comes up missing spaces like these. — Dustin, Feb 24 '14 at 07:38

score 3 · Answer 1 · answered Feb 24 '14 at 07:36

3

You were almost there, you just forgot to match the dot:

myString = myString.replaceAll("(\\p{Ll})\\.(\\p{Lu})", "$1. $2");

And since you're not actually doing anything with the letter before and after the dot, you can speed things up a bit by using lookaround assertions:

myString = myString.replaceAll("(?<=\\p{Ll})\\.(?=\\p{Lu})", ". ");

answered Feb 24 '14 at 07:36

Tim Pietzcker

328,213
58
503
561

Of course, now we're putting extra spaces into acronyms written with periods. We could try to tell whether we're looking at an acronym, but then we run into *more* edge cases. Natural language correction is messy. – user2357112 Feb 24 '14 at 07:39
yes, but this is still missing the fact that it could be a number, lowercase letter, or uppercase letter before and after the period. – Dustin Feb 24 '14 at 07:40
I know this is a messy thing to edit... but there will be very very few of these cases I think – Dustin Feb 24 '14 at 07:41
If you also want to replace dots after uppercase letters and digits, just use `[\\p{L}\\d]` instead of `\\p{Ll}`, but then you'd also replace `C.I.A.` with `C. I. A.`. – Tim Pietzcker Feb 24 '14 at 08:12
@TimPietzcker: Didn't see that the lookarounds were specifically lowercase and uppercase. It means we're missing *different* weird edge cases, but C.I.A. is currently fine. – user2357112 Feb 24 '14 at 08:13
hi can i optimize this , am calling this like #1 in my web applilcationo for generating sql statments ? – shareef Mar 25 '18 at 14:18

Java replace regex before and after a period

1 Answers1