-1

I'm using HerokuApp to create a reg_exp that matches the content of xml.

I'm not trying to parse xml but only to extract it.

<xml> <balise1> </balise1> <table> <tr> <td> cas1 </td> <td> cas2 </td> </tr> <tr> <td> new </td> <td> line </td> </tr> </table> </xml>

This is the pattern I wrote to match the content of tr tags. Thanks to this documentation

(?<content>(<tr>(.)*</tr>))

So, the output of this regular expression gives :

{
  "content": [
    [
      "<tr> <td> cas1 </td> <td> cas2 </td> </tr> <tr> <td> new </td> <td> line </td> </tr>"
    ]
  ]
}

When I want it to be :

{
  "content": [
    [
      "<tr> <td> cas1 </td> <td> cas2 </td> </tr>"
    ]
  ]
}

The problem seems to be that the first occurrence is not detected and only the last occurrence is.

How can I specify that "any number of char" must not contain a new tr tag ?

Do you have suggestions ?

vdolez
  • 977
  • 1
  • 14
  • 33
  • 1
    This should do the trick `(?((.)*?))` - non-greedy matching after the opening tr tag. – collapsar Mar 19 '15 at 11:38
  • Thanks for that ! Now I figured this wasn't the output I wanted. What if I want the output to be the list of all the and not just the first one ? (if you post as an answer I'll accept it) – vdolez Mar 19 '15 at 13:30
  • I edited the output I wanted. I actually want to list all occurrences of a tr balisa – vdolez Mar 19 '15 at 14:18

1 Answers1

0

According to Collapsar comment, I used a greedy operator where I should have used a reluctant operator. This document explains the syntax of operators

EDIT : I updated the link to the documentation as it was changed since.

vdolez
  • 977
  • 1
  • 14
  • 33