1

I already tried two days to solve the Problem, that I have a MatchCollection. In the patter is a Group and I want to have a list with the Solutions of the Group (there were two or more Solutions).

string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "^<tr>$?<td>$?[D-M][i-r],[' '][0-3][1-9].[0-1][1-9].[0-9][0-9]$?</td>$?<td>$?([1-9][0-2]?)$?</td>$?";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);

foreach (Match match in mc2)
{
     GroupCollection groups = match.Groups;
     string s = groups[1].Value;
     Datum2.Text = s;
}

But only the last match (2) appears in the TextBox "Datum2". I know that I have to use e.g. a listbox, but the Groups[1].Value is a string...

Thanks for your help and time. Dieter

Steven Doggart
  • 43,358
  • 8
  • 68
  • 105
Dieter Müller
  • 111
  • 1
  • 13
  • you are replacing the Datum2 text field with s. It is not appending anything to Datum2.Text. If you want to see all matches you can say Datum2.Text = s + Datum2.Text – lazy Sep 28 '15 at 14:36
  • But Datum2 is an empty textbox. – Dieter Müller Sep 28 '15 at 14:47
  • Only the first match. – Mark Jansen Sep 28 '15 at 14:57
  • 1
    Did you consider using other tools than regex for parsing out data from HTML strings? Did you hear about HtmlAgilityPack? – Wiktor Stribiżew Sep 28 '15 at 14:57
  • There is loop there around match collection. If you want to print all matches in the text box either convert the [match collection to string array](http://stackoverflow.com/questions/11416191/how-to-convert-matchcollection-to-string-array) or use Datum2.Text = s + Datum2.Text – lazy Sep 28 '15 at 15:04

1 Answers1

0

First thing you need to correct in the code is Datum2.Text = s; would overwrite the text in Datum2 if it were more than one match.

Now, about your regex,

  • ^ forces a match at the begging of the line, so there is really only 1 match. If you remove it, it'll match twice.
  • I can't seem to understand what was intended with $? all over the pattern (just take them out).
  • [' '] matches "either a quote, a space or a quote (no need to repeat characters in a character class.
  • All dots in [0-3][1-9].[0-1][1-9].[0-9][0-9] need to be escaped. A dot matches any character otherwise.
  • [0-1][1-9] matches all months except "10". The second character shoud be [0-9] (or \d).

Code:

string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "<tr><td>[D-M][i-r],[' ][0-3][0-9]\\.[0-1][0-9]\\.[0-9][0-9]</td><td>([1-9][0-2]?)</td>";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
string s= "";

foreach (Match match in mc2)
{
     GroupCollection groups = match.Groups;
     s = s + " " + groups[1].Value;
}

Datum2.Text = s;

Output:

1 2

DEMO


You should know that regex is not the tool to parse HTML. It'll work for simple cases, but for real cases do consider using HTML Agility Pack

Mariano
  • 6,423
  • 4
  • 31
  • 47