0
<tbody id="clavier:infractionList2:tb">
    <tr class="rich-table-row rich-table-firstrow ">
    ..............
    ..............
    ............

    </tr>
</tbody>  

I'm looking to find a Regex to get this value from a big text.

I tried this one but without result:

#<tbody id=\"clavier:infractionList2:tb\">(.*)</tbody>#
Josh Darnell
  • 11,304
  • 9
  • 38
  • 66
elbaz
  • 1
  • 1
  • 1
    Could you please improve your question, so that we can help you? Right now it's pretty impossible to understand what you want. – Sergio Tulentsev Dec 22 '11 at 11:33
  • Have you added the multiline flag in order for `.` to match multiple lines (hence the name)? – jensgram Dec 22 '11 at 11:34
  • @jensgram: You're thinking of the **single-line** flag. It allows the `.` to match newline characters, which it normally doesn't. – Alan Moore Jan 04 '12 at 15:35

3 Answers3

2

Regex with html is often a bad idea, because of potential recursive tags. Have you tried using an XML/HTML parser? For example, XmlDocument, XmlElement and XmlAttribute.

EDIT: The problem with regex and html in your example:

  • Cannot keep count of recursive tbody tags
  • Will the tbody tag can look like <tbody>...</tbody> or <tbody .../>?
  • Even if you know there will be one start and end tag, how do you know there won't be any plain text containing "tbody" somewhere inside the table, thus breaking the regex?
Peet Brits
  • 2,911
  • 1
  • 31
  • 47
  • sorry i must use regex to solve this probleme i dont have to choose – elbaz Dec 22 '11 at 11:39
  • update your question with what you really want to do, maybe there's a different approach. – Peet Brits Dec 22 '11 at 12:15
  • @user1111614: So this is homework? Then [provide this answer](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) to be awarded bonus points. – Benoit Dec 22 '11 at 12:15
0

You may want to tell your regex engine that it should match newlines with the . as well.

In PHP, that would make the regex:

#<tbody id=\"clavier:infractionList2:tb\">(.*)</tbody>#s

Note the trailing s

Warning if there are 2 tbodies, this regex will match everything starting from the first tbody (with this ID) until the last tbody (ID-independent).

Example:

<tbody id="clavier:infractionList2:tb">Some data</tbody>
<tbody id="tbody2"></tbody>

will also be matched.

Tom van der Woerdt
  • 29,532
  • 7
  • 72
  • 105
-1

This works:

/<tbody id="clavier:infractionList2:tb">(.*?)<\/tbody>/is

Or full PHP:

<?php
$html = '<tbody id="clavier:infractionList2:tb">
    <tr class="rich-table-row rich-table-firstrow ">
    ..............
    ..............
    ............

    </tr>
</tbody>  ';

preg_match_all('/<tbody id="clavier:infractionList2:tb">(.*?)<\/tbody>/is', $html, $matches);

var_dump($matches[1]);

That gives you the <tr...>....</tr> as a result. If you only want the dots you'll need to use something like:

/<tbody id="clavier:infractionList2:tb">.*?<tr.*?>(.*?)<\/tr>.*?<\/tbody>/is
powerbuoy
  • 12,460
  • 7
  • 48
  • 78
  • sorry i use c# and System.Text.RegularExpressions.Regex to get value and i was test your answer but no result i just get a null result,thanx – elbaz Dec 23 '11 at 09:05
  • I believe the regexp should work regardless of programming language: `/(.*?)<\/tbody>/is` or `/.*?(.*?)<\/tr>.*?<\/tbody>/is` – powerbuoy Dec 23 '11 at 11:04