I want to extract a sequence out of some text.
The sequence begins with Diagnostic-Code:
, the middle part can be any characters even over multiple lines and the end is marked by an empty line (after which the text continues, but this isn't part of the desired sequence).
This does work for the beginning and middle part, but the ending is found too late:
(?s)Diagnostic-Code: (.+)\n\n
The string looks something like this:
...
Status: 5.0.0
Diagnostic-Code: X-Postfix; test.com
*this*
*should*
*be included too*
--EA7634814EFB9.1516804532/mail.example.com
Content-Description: Undelivered Message
...
--------- edit ---------
Thank you for the anwer @Gurman!
But java.util.regex does somehow behave differently than regex101.com
Action: failed
Status: 5.1.1
Remote-MTA: dns; gmail-smtp-in.l.google.com
Diagnostic-Code: smtp; 550-5.1.1 The email account that you tried to reach does
not exist. Please try 550-5.1.1 double-checking the recipient's email
address for typos or 550-5.1.1 unnecessary spaces. Learn more at 550 5.1.1
https://support.google.com/mail/?p=NoSuchUser u11si15276978wru.314 - gsmtp
--E8A363093CEC.1520529178/proxy03.hostname.net
Content-Description: Undelivered Message
Content-Type: message/rfc822
Return-Path: <no-reply@hostname.net>
The pattern matches the whole multiline diagnostic-code on regex101, but java only matches the first line as group 1:
smtp; 550-5.1.1 The email account that you tried to reach does
The java-code:
diagnosticCodePatter = Pattern.compile("(?i)diagnostic[-| ]Code: ([\\s\\S]*?[\\r\\n]{2})");
matcher = diagnosticCodePatter.matcher(message);
if (matcher.find()) {
diagnosticCode = matcher.group(0);