How to get text after ">" symbol except HTML tag

Question

I need to replace text like this: > here is text to something

But my regexp replace HTML tags too: <div class="class">text</div>

result is <div class="class" something text </div something

regexp: \>(?=(.*?))

I tried to except HTML tag with negative lookbehind: (?<!.+\<)\>(?=(.*?)) but it doesn't work.

How to fix it?

Thanks in advance!

For example: `
here is text
> need to parse it` Result should be: `
here is text
something need to parse it` — qwerty, Aug 19 '14 at 11:52
Do you want to change the text in between the tags or outside? You're questions seems like you want to change the text between but this comment suggests the text outside of the tags (Also, there appears to be an extra `>` at the end??). — skamazin, Aug 19 '14 at 11:57
http://regex101.com/r/tP2cS6/1 Please, look it. Sorry for my bad English. — qwerty, Aug 19 '14 at 12:01
@qwerty Perhaps something like [this](http://regex101.com/r/sL8vF1/10). It's not very robust, I just want to make sure I'm on the right track here... — skamazin, Aug 19 '14 at 12:03

score 2 · Accepted Answer · answered Aug 19 '14 at 11:59

2

Regex:

/<(\w+)\b.*?>.*?<\/\1>(*SKIP)(*F)|>/gs

Replacement string:

something

DEMO

answered Aug 19 '14 at 11:59

Avinash Raj

172,303
28
230
274

what about `
>here is text`
– Braj Aug 19 '14 at 12:00
@user3218114 how about this `<(\w+)\b.*?>(?:.*?<\/\1>)?(*SKIP)(*F)|>`? http://regex101.com/r/sL8vF1/8 – Avinash Raj Aug 19 '14 at 12:02
Wow! Amazing! Thank you so much, it's really difficult regexp. Could you tell me about SKIP? – qwerty Aug 19 '14 at 12:04
@RandomSeed `>` present outside the div tag got replaced. – Avinash Raj Aug 19 '14 at 12:06
@AvinashRaj Do you need that `\b`? Wouldn't the `(\w+)` capture all the way up to the boundary anyways? Or am I missing something? – skamazin Aug 19 '14 at 12:07
@qwerty `<(\w+)\b.*?>(?:.*?<\/\1>)?` Matches the whole tag. Following `(*SKIP)(*F)` makes the match to fail. And the regex after the `|` symbol would be matched from the remaining string. In our case it's `>` – Avinash Raj Aug 19 '14 at 12:08
I believe it still doesn't work with a real HTML document (e.g. http://regex101.com/r/sL8vF1/11) But you still have my +1 for this mind-twisting RegEx. – RandomSeed Aug 19 '14 at 12:22

score 1 · Answer 2 · edited May 23 '17 at 12:13

1

Parsing HTML with a regex is mission (almost) impossible. I would rather fully parse the HTML with the built-in PHP features.

At this point, after you have segregated tags and contents, it becomes trivial to apply the changes you need (usage example here) (possibly with a regex if you really want to :)

edited May 23 '17 at 12:13

Community

1
1

answered Aug 19 '14 at 11:53

RandomSeed

29,301
6
52
87

I do not need parse HTML with regexp. I need to replace text with `>` symbol to `something text`. But it should not replace HTML tag. – qwerty Aug 19 '14 at 11:54
2

You do need to parse HTML to some extent, to differentiate tag ends (e.g. ``), attribute values (e.g. ``) and real text (e.g. ` contents here > `. – RandomSeed Aug 19 '14 at 11:56
Also notice that, formally speaking, the character `>` is forbidden inside a HTML document. It is supposed to be replaced by its HTML entity (`>`). You may want to replace such entities too. – RandomSeed Aug 19 '14 at 11:59

How to get text after ">" symbol except HTML tag

2 Answers2