How to capture text between XML Summary?

Question

I have single-line and multi-line XML Summary texts, that look like these.

/// <summary> This is a single-line XML comment. </summary> 

/// <summary> This is a multi-line XML comment.
/// These are additional lines with more text.
/// Some more of these text. </summary>

/// <summary> This is another XML text summary with a different
/// format.
/// </summary>

In RegexBuddy, how would I capture the texts within, without the /// and the <summary> </summary> tags?

I came up with the following to capture a multi-line XML summary:

  ((\s*(///)\s*((<summary>)?))(.*))+(</summary>)$

and a single XML summary:

  \s*///\s*(<summary>).*(</summary>)$

But I've no idea how to capture just the text.

What would be the regular expression I would use, in order to capture just the text, so that I can use it in a replacement reference?

Thank you in advance.

I don't think it can be done with a single regex. I think you need to extract all the text between `
` and `
` and then remove the `///` strings. And don't forget the `dotall` flag for multi-line summaries. I can post a solution in java if that is appropriate and relevant. — Abra, Dec 25 '20 at 07:34
Do you want to get the remaining text? Try replacing `^///\s*(?:
\s*)?|\s*
` with an empty string, see [this demo](https://regex101.com/r/e7MSz7/1). — Wiktor Stribiżew, Dec 25 '20 at 10:59

score 1 · Accepted Answer · answered Dec 25 '20 at 20:40

Use the PCRE engine:

(?:^///\s*(?:<summary>)?|</summary>)(*SKIP)(*F)|(?:(?!</?summary>|^///(?!/)\s*).)+

See proof

Explanation

--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    ^                        the beginning of the string
--------------------------------------------------------------------------------
    ///                      '///'
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      <summary>                '<summary>'
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    </summary>               '</summary>'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (*SKIP)                     'SKIP' verb, skips the match
--------------------------------------------------------------------------------
  (*F)                        'FAIL' verb, triggers fail and backtracking
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (1 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
      <                        '<'
--------------------------------------------------------------------------------
      /?                       '/' (optional (matching the most
                               amount possible))
--------------------------------------------------------------------------------
      summary>                 'summary>'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      ^                        the beginning of the string
--------------------------------------------------------------------------------
      ///                      '///'
--------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
        /                        '/'
--------------------------------------------------------------------------------
      )                        end of look-ahead
--------------------------------------------------------------------------------
      \s*                      whitespace (\n, \r, \t, \f, and " ")
                               (0 or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
  )+                       end of grouping

How to capture text between XML Summary?

1 Answers1