Regex exclude character between string tag

Question

So i have a string that i want to search through using regex, and not any other method like domDocument etc.

Example:

<div class="form-item form-type-textarea form-item-answer2">
<div class="form-textarea-wrapper resizable"><textarea id="edit-answer2" name="answer2" cols="60" rows="5" class="form-textarea">
this is some text
</textarea>
</div>
</div>

Desired:

this is some text

So what i want to do from this is using 1 regex line be left with 'this is some text', which is not fixed and will be dynamic. I will then pass this through a preg_replace to get desired outcome.

Current regex is

div class="form-item.*class="form-textarea">$\A<\/textarea>.*<\/div>/gU

I have tried using the end of string and start of string anchors, but to no avail.

hek2mgl · Answer 1 · 2014-07-02T15:58:44.250

0

Don't parse HTML with regexes. Use a DOM parser:

$doc = new DOMDocument();
$doc->loadHTML($html);

$textarea = $doc->getElementById("edit-answer2");    
echo $textarea->nodeValue;

if you want to modify the value:

$textarea->nodeValue = "foo bar";
$html = $doc->saveHTML();

edited Jul 02 '14 at 15:58

answered Jul 02 '14 at 15:49

hek2mgl

152,036
28
249
266

You have posted a sufficient amount of [similar answers](http://stackoverflow.com/questions/24082980/regex-php-get-only-bold-section-in-a-string), and could have just closevoted instead of continuing the DOM-answer-on-regex **tag rot**. – mario Jul 02 '14 at 15:58
Oh. I wouldn't have expected such a feedback. You are right, I have answered related answers a lots of times. But you see, it will get asked again and again (and again), and if there aren't 5 guys who see this like you, such a post will likely getting answered, and accepted with a regex solution. Do you think, that this would be better? – hek2mgl Jul 02 '14 at 16:01
No, but I believe a canonical question might be better instead of repeating the endless cycle. While the general answer direction is technically correct, offering an alternative copy+paste answer hasn't and won't ever stop the influx of someone asking for black box solutions. A more objective (not conflating parsing and matching), tutorative (and the obvious overhead of regex to get it right) and didactive explanation on when to use which might help more. – mario Jul 02 '14 at 16:06
@mario After reading some meta posts about this, also from you, I now realized that my answer was the victim of some DOM against REGEX *war*. I think this is childish and my answer is valid. Nothing more to say. – hek2mgl Jul 03 '14 at 10:16
So, uh, you reopened this repetitive question because you didn't like the canonical DOM reference? Yet accuse me of your interpretation of some war against that? And you couldn't be bothered to closevote because every badly researched queston needs to be rewarded with near-identical posts, because OP can't be expected to substitute a single id? - Do not summon me again. – mario Jul 03 '14 at 11:05
@mario You are right with "there are tons of material on how to obtain an element's value by it's id". The "DOM-answer-on-regex tag rot" discussion deviated me from that. I've closed it. – hek2mgl Jul 03 '14 at 11:27

Avinash Raj · Answer 2 · 2014-07-02T16:18:00.557

0

Your regex would be,

/<textarea id[^>]*>\n([^\n]*)/gs

DEMO

OR

/<textarea id[^>]*>(.*?)(?=<\/textarea>)/gs

DEMO

Captured group1 conatins the string this is some text

OR

you could use the below regex to match only the string this is some text.

/div class="form-item.*class="form-textarea">[^\n]*\n\K[^\n]*/s

DEMO

edited Jul 02 '14 at 16:18

answered Jul 02 '14 at 15:49

Avinash Raj

172,303
28
230
274

A regex isn't the best choice here. You know that: http://stackoverflow.com/a/1732454/171318 ? – hek2mgl Jul 02 '14 at 15:51
@hek2mgl op wants only a regex solution.`So i have a string that i want to search through using regex, and not any other method like domDocument etc.` – Avinash Raj Jul 02 '14 at 15:51
OPs usually want the most obscure things, but you need to teach them. Your solution is a) highly fragile, as it requires the newlines to be present in the html source b) will not work if there are multiple textareas in the document. (Which is likely, as the node has an id attribute) – hek2mgl Jul 02 '14 at 15:53
whilst the regex you have given outputs my matches in groups, what i am looking to do is pass the regex pattern through a preg_replace in order to delete the outer tag text and be left with only the desired text. As this is grouping the desired text, running this through a preg_replace is removing the text that i want to keep. – Key Jul 03 '14 at 08:58
but you said the desired output as only `this is some text` – Avinash Raj Jul 03 '14 at 09:03
Which text did you want to keep? post the desired output. I'll help you. – Avinash Raj Jul 03 '14 at 09:18

Regex exclude character between string tag

2 Answers2