You need a HTML Paser, locate and read out the plain text and pick the substring, here an example with DOMXpath
:
$doc = DOMDocument::loadHTML($html);
$xp = new DOMXPath($doc);
$chars50 = $xp->evaluate('substring(normalize-space(//body),1,50)');
Demo:
string(50) "This economy car is great value for money and with"
Take note that you'll get an UTF-8 encoded string here. You can do this on your own as well with regular expressions (which might help you to cut at words) for example:
# load text from HTML
$text = DOMDocument::loadHTML($html)->getElementsByTagName('body')->item(0)->nodeValue;
# normalize HTML whitspace
$text = trim(preg_replace('/\s{1,}/u', ' ', $text));
# obtain the substring (here: UTF-8 safe operation, see as well mb_substr)
$chars50 = preg_replace('/^(.{0,50}).*$/u', '$1', $text);
Demo
If you're using strip_tags
instead of a HTML parser, you would need to deal with the different encodings on your own. As the original string already has the question-mark which signals an unicode replacement character, I'd say you already deal with borked data so better use a library that re-presents like DOMDocument
instead of strip_tags
which is not safe (see the warning on the PHP Manual page).