1

I've following rtf string: \af31507 \ltrch\fcs0 \insrsid6361256 Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}}{\rtlch\fcs1 \af31507 \ltrch\fcs0 \insrsid12283827 and I want to extract the content of Study Title ie (Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}). Below is my code

String[] arr = value.split("\\s+");
//System.out.println(arr.length);
for(int j=0; j<arr.length; j++) {
    if(isNumeric(arr[j])) {
         arr[j] = "\\?" + arr[j];
    }
}

In above code, I'm splitting the string by space and iterating over the array to check if there is any number in string, however, isNumeric function is unable to process 8000 which is after \u8805 because its getting the content as 8000}}{\rtlch\fcs1. I'm not sure how I can search the Study title and its content using regex?

shanky
  • 751
  • 1
  • 16
  • 46
  • 1
    Suggest you use an RTF parser. See [Java RTF Parser](https://stackoverflow.com/q/17223903/5221149). – Andreas Jan 29 '18 at 01:29
  • Hi @Andrea I can use RTF parser, however I'm not sure If I can get the unicode chars as I want to update the contents of my Study Title string. That's the reason I'm not using RTF parser as it will display the plain text without those unicode chars – shanky Jan 29 '18 at 01:35

1 Answers1

2

Study Title: {[^}]*} will match your expect. Demo: https://regex101.com/r/FZl1WL/1

    String s = "{\\af31507 \\ltrch\\fcs0 \\insrsid6361256 Study Title: {Test for 14431 process\\'27s \\u8805 1000 Testing2 14432 \\u8805 8000}}{\\rtlch\\fcs1 \\af31507 \\ltrch\\fcs0 \\insrsid12283827";
    Pattern p = Pattern.compile("Study Title: \\{[^}]*\\}");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group());
    }

output:

Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}

Update as per OP ask

String s = "{\\af31507 \\ltrch\\fcs0 \\insrsid6361256 Study Title: {Test for 14431 process\\'27s \\u8805 1000 Testing2 14432 \\u8805 8000}}{\\rtlch\\fcs1 \\af31507 \\ltrch\\fcs0 \\insrsid12283827";
    Pattern p = Pattern.compile("(?<=Study Title: \\{)[^}]*(?=\\})");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group());
    }

Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000
Shen Yudong
  • 1,190
  • 7
  • 14
  • Hi @yudong thanks for answer. Also how can I get the content only like 'Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000' rather with study title and without start '{' and ending '}' ? – shanky Jan 29 '18 at 02:02
  • @shanky, just update pattern to (?<=Study Title: \\{)[^}]*(?=\\}) – Shen Yudong Jan 29 '18 at 02:06
  • Hi @yudong its giving me java.util.regex.PatternSyntaxException: Illegal repetition error – shanky Jan 29 '18 at 02:07