1

I am working on java string split. I wish to split the String according to ". uppercase"(There is a space between "." and "uppercase"), for example:

". A" ". B" ". C"...

Also, I wish to preserve the "." and the "uppercase", is there any efficient way to do that? I use

String.split("\\.\\s") 

before, but it will remove the "." I use. So that's not an ideal solution. Thanks

Sample result

String = This is an Egg. This is a dog. "I just come up with this example"
String[0] = This is an Egg.
String[1] = This is a dog. "I just come up with this example"

More edit:

There is an issue that the usual way seems will preserve the delimiter at one of it's string. but I wish to split the delimiter in some sense.(in my example, the ". [A-Z]" is splited too)

JLTChiu
  • 983
  • 3
  • 12
  • 28
  • It would help if you provided an example String input, and the String[] output tokens you expect. – ulmangt Apr 21 '12 at 15:26
  • the problem is the split function doesn't return the delimiter as part of the strings returned. You can right your own split pretty easily with substring and manually keep the string as you want. – twain249 Apr 21 '12 at 15:27
  • exact copy of http://stackoverflow.com/questions/275768/is-there-a-way-to-split-strings-with-string-split-and-include-the-delimiters – Serdalis Apr 21 '12 at 15:31

2 Answers2

3

You can use a lookaround:

str.split("(?<=\\.\\s+)(?=\p{Lu})")

This will split "First sentence. Foo bar. test" into the array

{ "First sentence. ", 
  "Foo bar. test" }

If you don't want the space to be included, just put it between the lookaround assertions:

str.split("(?<=\\.)\\s+(?=\p{Lu}")

This will result in

{ "First sentence.", 
  "Foo bar. test" }

For the example string above.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • Hmm...It left the delimiter at the second string, while I wish to save the first part at the first string and the second part at the second string – JLTChiu Apr 21 '12 at 15:35
  • @JLTChiu: Edited to use a lookbehind as well as lookahead. – Niklas B. Apr 21 '12 at 15:40
  • @JLTChiu: Please check my new edit. I changed `[A-Z]` to `\p{Lu}` for the regex to work with non-ASCII texts. – Niklas B. Apr 21 '12 at 15:48
-2

Here You go:

Use str.replaceAll(". ",".##");

str.replaceAll(".",".##");

Then use String.split("##")

This will give you the required string.

Check this link

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
Vipin Jain
  • 1,382
  • 1
  • 10
  • 19
  • I didn't downvote this but this is far from what the OP is asking. – Lion Apr 21 '12 at 15:41
  • What he is asking is to split the individual lines. i think this method will help him out. – Vipin Jain Apr 21 '12 at 15:45
  • @VIPIN: No, he was asking to keep the delimited while splitting. – Niklas B. Apr 21 '12 at 15:49
  • 1
    @VIPIN: Did you downvote me out of revenge or what? That wouldn't be fair at all. – Niklas B. Apr 21 '12 at 15:59
  • lol no bud but the solution is not correct according to me and you can check this out – Vipin Jain Apr 21 '12 at 16:00
  • 1
    @VIPIN: Which solution is not correct? I already removed my downvote because you are correct that your solution works to a limited degree. It doesn't fulfil the specification, though. For example, it doesn't work if there is a `##` already in the string or if the whitespace is not followed by an uppercase character or if `\t` or `\n` is used as whitespace. – Niklas B. Apr 21 '12 at 16:01
  • any way thanks:) and my goal is to give him a solution so that he can modify it according to his needs:) – Vipin Jain Apr 21 '12 at 16:20
  • @VIPIN: Do you think OP is too stupid to comprehend lookaround? I doubt that. – Niklas B. Apr 21 '12 at 16:26