-1

I have a target string like:

   "addr: line1 
         line2

      tel:12345678"

note: between line2 and tel, there might be 1 or multiple new lines:\r\n or \r\n\r\n or more. The result I want to get is as below:

   "addr: line1 
         line2"

no \r\n under line2.

My questions are:

1)If I use

/addr[\s\S]+(?=(\r\n)+tel)/

, i will get the addr without tel, but I can't get rid of "\r\n"s under "line2", how could I do that?

2)I know [\s\S] represents any characters including \r,\n, and (.|\n|\r) can do that too. But why [.\n] can't? It's just like the syntax of[\s\S] isn't it?

Thank you very much!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
a_pp
  • 39
  • 7

1 Answers1

0

You need to make the first "+" non-greedy, so that it does not match the whole whitespace before "tel"

  var regex = new Regex(@"addr.+?(?=\s+tel)", RegexOptions.Singleline);
  var result = regex.Match(text).Value;
Klaus Gütter
  • 11,151
  • 6
  • 31
  • 36
  • Added a demo on [regex101.com](https://regex101.com/r/U35eq7/1), +1. – Jan Sep 21 '19 at 06:13
  • Thank you Klaus! now I understand adding a ? can make + non-greedy, that works perfectly! (although I needed to alter your pattern to "addr[\s\S]+?(?=(\r\n)+tel)" to adapt my string strictly). BTW, do you have any idea about my 2nd question? – a_pp Sep 21 '19 at 06:25