-2

I need to separate a sentence using (.) However, I came across numbers. How can I define a split(.) excluding points that are between numbers?

Example:

"I paid 1.000 dollars. Very expensive. But I think today it should be cheaper."

I got this:

I paid 1.
000 dollars.
Very expensive.
But I think today it should be cheaper.

But I need this:

I paid 1.000 dollars.
Very expensive.
But I think today it should be cheaper.

PPavesi
  • 43
  • 6
  • 2
    Well don't `split`. Instead iterate over the chars, if you detect a `.` check if the next one is a numeric one, if so don't split, if, create a new string from the part you just read. – M. Deinum Dec 15 '22 at 15:09
  • 2
    Does this answer your question? [Regex for splitting into sentences, ignoring decimal numbers as part of the split?](https://stackoverflow.com/questions/52208602/regex-for-splitting-into-sentences-ignoring-decimal-numbers-as-part-of-the-spli) – Marvin Dec 15 '22 at 15:27
  • 2
    Possibly related: [How to split paragraphs into sentences?](https://stackoverflow.com/q/21430447) – Pshemo Dec 15 '22 at 15:37

3 Answers3

0

Using the regex from this answer, you can do the following:

    public static String[] split(String str) {
        return str.split("[\\.\\!]+(?!\\d)\\s*|\\n+\\s*");
    }

The result:

I paid 1.000 dollars
Very expensive
But I think today it should be cheaper

regex101.com

Albina
  • 1,901
  • 3
  • 7
  • 19
0

Just use negativa lookarounds:

String textToParse = "I paid 1.000 dollars. Very expensive. But I think today it should be cheaper.";
String[] chunks = textToParse.split("(?<!\\d)\\.(?!\\d)");
for(int i = 0; i < chunks.length; i++){
    System.out.println(chunks[i].trim());
}

Explanation:

i used negativa lookahead, which asserts that what follows is not matching pattern specified, so (?!\d) assuers that we will match, if text is NOT followed by any digit \d.

I also used negativa loookbehind, but it's totally equivalent to above, but just look what preceeds the text, not what follows. So in a same manner, we just assure what is before is not a digit.

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
0

Regular Expression is low performance when the input string is too long. You can visit each character to verify the dot position and split it. e.g:

public static void main(String[] args) {

        String str = "I paid 1.000 dollars. Very expensive. But I think today it should be cheaper.";

        StringBuilder sb = new StringBuilder(64);

        int i =0, length = str.length();
        for (; i < length - 1; i++) {
            char ch = str.charAt(i);
            if (ch == '.' && str.charAt(i + 1) == ' ') {
                System.out.println(sb.append(ch));
                sb.setLength(0); // Reset buffer
                i++; // Skip the empty char
                continue;
            }

            sb.append(ch);
        }

        System.out.println(sb.append(str.substring(i)));
    }
Find Bugs
  • 125
  • 3