substring in html

Question

<p style="color:red;font-size:12px;">This economy car is great value for money and with the added benefit of air conditioning is ideal for couples and small families. A ?500 excess applies which can be waived to NIL for only <b>5.00</b> per day</p>

using the below 2 methods

substr($mytext,0,25);

and

 $s = html_entity_decode($mytext);
 $sub = substr($s, 0, 50);�

need to get the first 50 chars...anybody please help

thanks

What should the final result be? (and why does this question get 3 upvotes? It's not clear what the question is about.) — Felix Kling, Apr 18 '12 at 13:42
Try `$sub = substr(html_entity_decode(strip_tags($mytext)), 0, 50);` — DaveRandom, Apr 18 '12 at 13:43

score 2 · Accepted Answer · answered Apr 18 '12 at 13:43

2

Hope this is working... Just try

echo (substr(strip_tags($mytext), 0, 25));

http://www.ideone.com/6TgJX

answered Apr 18 '12 at 13:43

Natasha

980
4
16
33

Entities anyone? Multibyte strings anyone? – hakre Apr 18 '12 at 14:01

score 2 · Answer 2 · edited May 23 '17 at 12:21

You need a HTML Paser, locate and read out the plain text and pick the substring, here an example with DOMXpath:

$doc = DOMDocument::loadHTML($html);
$xp = new DOMXPath($doc);
$chars50 = $xp->evaluate('substring(normalize-space(//body),1,50)');

Demo:

string(50) "This economy car is great value for money and with"

Take note that you'll get an UTF-8 encoded string here. You can do this on your own as well with regular expressions (which might help you to cut at words) for example:

# load text from HTML
$text = DOMDocument::loadHTML($html)->getElementsByTagName('body')->item(0)->nodeValue;

# normalize HTML whitspace
$text = trim(preg_replace('/\s{1,}/u', ' ', $text));

# obtain the substring (here: UTF-8 safe operation, see as well mb_substr)
$chars50 = preg_replace('/^(.{0,50}).*$/u', '$1', $text);

Demo

If you're using strip_tags instead of a HTML parser, you would need to deal with the different encodings on your own. As the original string already has the question-mark which signals an unicode replacement character, I'd say you already deal with borked data so better use a library that re-presents like DOMDocument instead of strip_tags which is not safe (see the warning on the PHP Manual page).

substring in html

2 Answers2

Linked