1

I am using this pattern

const string ptnBodytext = @"<p>\s*(.+?)\s*</p>";

in order to extract the text within the <p> tags. It works fine except for those text with newline, e.g.:

<p>
    Lorem ipsum
    second line or
    third one?
</p>

How can I change the pattern in order to include newline, tabs and so on?

Manfred Radlwimmer
  • 13,257
  • 13
  • 53
  • 62
Ras
  • 628
  • 1
  • 11
  • 29

2 Answers2

4

You either need to activate the dotall mode or:

const string ptnBodytext = @"<p>([\s\S]+?)</p>";

See a demo on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
2

Just remove the \s*:

const string ptnBodytext = @"<p>(.+?)</p>";
Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
  • 2
    [**Not** true](https://regex101.com/r/yG5hW3/1) without [`DOTALL`](https://regex101.com/r/yG5hW3/2) mode. Additionally, `\s*` matches *zero* or more whitespace characters. – Jan Aug 23 '16 at 11:47