1

I need to replace text like this: > here is text to something

But my regexp replace HTML tags too: <div class="class">text</div>

result is <div class="class" something text </div something

regexp: \>(?=(.*?))

I tried to except HTML tag with negative lookbehind: (?<!.+\<)\>(?=(.*?)) but it doesn't work.

How to fix it?

Thanks in advance!

qwerty
  • 107
  • 1
  • 3
  • 9

2 Answers2

2

Regex:

/<(\w+)\b.*?>.*?<\/\1>(*SKIP)(*F)|>/gs

Replacement string:

something

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • what about `
    >here is text`
    – Braj Aug 19 '14 at 12:00
  • @user3218114 how about this `<(\w+)\b.*?>(?:.*?<\/\1>)?(*SKIP)(*F)|>`? http://regex101.com/r/sL8vF1/8 – Avinash Raj Aug 19 '14 at 12:02
  • Wow! Amazing! Thank you so much, it's really difficult regexp. Could you tell me about SKIP? – qwerty Aug 19 '14 at 12:04
  • @RandomSeed `>` present outside the div tag got replaced. – Avinash Raj Aug 19 '14 at 12:06
  • @AvinashRaj Do you need that `\b`? Wouldn't the `(\w+)` capture all the way up to the boundary anyways? Or am I missing something? – skamazin Aug 19 '14 at 12:07
  • @qwerty `<(\w+)\b.*?>(?:.*?<\/\1>)?` Matches the whole tag. Following `(*SKIP)(*F)` makes the match to fail. And the regex after the `|` symbol would be matched from the remaining string. In our case it's `>` – Avinash Raj Aug 19 '14 at 12:08
  • I believe it still doesn't work with a real HTML document (e.g. http://regex101.com/r/sL8vF1/11) But you still have my +1 for this mind-twisting RegEx. – RandomSeed Aug 19 '14 at 12:22
1

Parsing HTML with a regex is mission (almost) impossible. I would rather fully parse the HTML with the built-in PHP features.

At this point, after you have segregated tags and contents, it becomes trivial to apply the changes you need (usage example here) (possibly with a regex if you really want to :)

Community
  • 1
  • 1
RandomSeed
  • 29,301
  • 6
  • 52
  • 87
  • I do not need parse HTML with regexp. I need to replace text with `>` symbol to `something text`. But it should not replace HTML tag. – qwerty Aug 19 '14 at 11:54
  • 2
    You do need to parse HTML to some extent, to differentiate tag ends (e.g. ``), attribute values (e.g. ``) and real text (e.g. ` contents here > `. – RandomSeed Aug 19 '14 at 11:56
  • Also notice that, formally speaking, the character `>` is forbidden inside a HTML document. It is supposed to be replaced by its HTML entity (`>`). You may want to replace such entities too. – RandomSeed Aug 19 '14 at 11:59