-1

I have a text file as below:

1.1 - Hello
1.2 - world!
2.1 - Some
data
here and it contains some 32 digits so i cannot use \D+
2.2 - Etc..

so i want a regex to get 4 matches in this case for each point. My regex doesn't work as I wish. Please, advice:

private readonly Regex _reactionRegex = new Regex(@"(\d+)\.(\d+)\s*-\s*(.+)", RegexOptions.Compiled | RegexOptions.Singleline);

even this regex isn't very helpful:

(\d+)\.(\d+)\s*-\s*(.+)(?<!\d+\.\d+)
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Alex Zhukovskiy
  • 9,565
  • 11
  • 75
  • 151
  • 1
    you need to show what you expect to capture – nPn May 20 '14 at 18:09
  • Your 3th match must be `"2.1 - Some data here and it contains some 32 digits so i cannot use \D+ "`? – sergiogarciadev May 20 '14 at 18:12
  • FYI I changed the answer and [online demo](https://ideone.com/MGgxQK) so that it shows you not just how to get the text part of the string: `Hello`, but also the digits if you want: `1.1 - Hello`. So you now have both options. Take a look at the bottom of the demo, it shows the output of the code. – zx81 May 20 '14 at 20:56
  • @nPn i wanted a capture that starts with `\d+\.\d+` and match everything until next `\d+\.\d+`. I got an answer, and it is i'm talking about. – Alex Zhukovskiy May 22 '14 at 07:19

2 Answers2

1

Alex, this regex will do it:

(?sm)^\d+\.\d+\s*-\s*((?:.(?!^\d+\.\d+))*)

This is assuming that you want to capture the point, without the numbers, for instance: just Hello

If you want to also capture the digits, for instance 1.1 - Hello, you can use the same regex and display the entire match, not just Group 1. The online demo below will show you both.

How does it work?

  1. The idea is to capture the text you want to Group 1 using (parentheses).
  2. We match in multi-line mode m to allow the anchor ^ to work on each line.
  3. We match in dotall mode s to allow the dot to eat up strings on multiple lines
  4. We use a negative lookahead (?! to stop eating characters when what follows is the beginning of the line with your digit marker

Here is full working code and an online demo.

using System;
using System.Text.RegularExpressions;
using System.Collections.Specialized;
class Program {

static void Main() {
string yourstring = @"1.1 - Hello
1.2 - world!
2.1 - Some
data
here and it contains some 32 digits so i cannot use \D+
2.2 - Etc..";
var resultList = new StringCollection();
try {
    var yourRegex = new Regex(@"(?sm)^\d+\.\d+\s*-\s*((?:.(?!^\d+\.\d+))*)");
    Match matchResult = yourRegex.Match(yourstring);
    while (matchResult.Success) {
        resultList.Add(matchResult.Groups[1].Value);
    Console.WriteLine("Whole Match: " + matchResult.Value);
    Console.WriteLine("Group 1: " + matchResult.Groups[1].Value + "\n");
        matchResult = matchResult.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Console.WriteLine("\nPress Any Key to Exit.");
Console.ReadKey();
} // END Main
} // END Program
zx81
  • 41,100
  • 9
  • 89
  • 105
  • Thanks a lot, that's what I was looking for! And explanation is really really helpful! :) – Alex Zhukovskiy May 22 '14 at 07:17
  • 1
    @AlexJoukovsky It was a pleasure, glad to hear it worked for you! If you want to see more simple regex tricks, you might enjoy [this question](http://stackoverflow.com/questions/23589174/match-or-replace-a-pattern-except-in-situations-s1-s2-s3-etc/23589204#23589204), I had a lot of fun with it. :) – zx81 May 22 '14 at 10:15
0

This may do for what you're looking for, though there is some ambiguity of the expected result.

(\d+)\.(\d+)\s*-\s*(.+?)(\n)(?>\d|$)

The ambiguity is for example what would you expect to match if data looked like:

1.1 - Hello
1.2 - world!
2.1 - Some
data here and it contains some 
32 digits so i cannot use \D+
2.2 - Etc..

Not clear if 32 here starts a new record or not.

LB2
  • 4,802
  • 19
  • 35