0

I have some trouble with a generated file, and I like to make some substitution

Say I have got this pattern :

<ul/><htmlelement>some text</htmlelement>

I want to find with my regexep the value of some text, since I can find the element htmlelement with a regexp, i want to recursively include it in the regex like

preg_match_all("#<ul/><([^><])>(.)*</(first capuring match)>#", $string, $matches);

Do you have a solution?

Krish Munot
  • 1,093
  • 2
  • 18
  • 29
user2626210
  • 115
  • 5
  • 13
  • 1
    Possible duplicate of [How do you parse and process HTML/XML in PHP?](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – chris85 Jan 09 '17 at 14:10
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Alessandro Da Rugna Jan 09 '17 at 16:59

1 Answers1

0
  1. You miss the + quantifier for the "htmlelement" opening tag.

  2. You need the * inside the capture group

  3. and better make it non-greedy with ?.

  4. Refer the "first capturing match" with \1.

So the regex should be:

<ul\/><([^><]+)>(.*?)<\/\1>
             ^    ^^     ^
             1    23     4

Demo: https://regex101.com/r/f25N9J/1

Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
  • Thank you for your help, in the second capturing parenthesis, I would like also to avoid to have the 1> string ...so something lik <[^><+]>(everything that is not containing \1>)\1> ( the escape \/ is not necessary in my php regex ... – user2626210 Jan 09 '17 at 15:26
  • Sorry, I don't quite understand the point avoiding the 1> in the second capture group. Could you please provide a data sample and expected result? – Dmitry Egorov Jan 09 '17 at 15:29
  • some text bla , I want the regex to capure some text, but not some text bla ... – user2626210 Jan 09 '17 at 15:45
  • This regex does so: https://regex101.com/r/f25N9J/2. That's why the non-greedy `?` (point 3) is required. – Dmitry Egorov Jan 10 '17 at 04:40