0

I'm aware of various ways to truncate an HTML string to a certain length including/not including the HTML tags as part of the result and/or truncating while preserving whole words and whatnot. My issue, though, is if the string includes special characters such as – or &

I need to truncate a string to 100 characters (or a few less if it would otherwise truncate in the middle of a special character). Right now I have a function:

$result= truncateIfNecessary(strip_tags($fullText), 100); //ignore HTML tags 

function truncateIfNecessary($string, $length) {
    if(strlen($string) > $length) {
        return substr($string, 0, $length).'...';
    } else {
        return $string;
    }
}

But if the string is something like text text – text (displayed on the page as: text text - text and $length falls in –, it returns text text &nda... which displays exactly like that, when I would need it to return text text....

EDIT:

(posted as answer)

WOUNDEDStevenJones
  • 5,150
  • 6
  • 41
  • 53

4 Answers4

0

I think your problem would be solved by changing the first line of code to:

$result = strip_tags(truncateIfNecessary($fullText, 100));

That way you first adjust the length and after that take care of the HTML characters.

SharpKnight
  • 455
  • 2
  • 14
  • This would work, but I believe it would result in incorrect lengths because it'd consider the tags as part of the length. The end result would likely be inconsistently shorter than 100 chars. – WOUNDEDStevenJones Sep 16 '13 at 18:10
  • @WOUNDEDStevenJones Yes you are right, decoding in the beginning of the function and encoding at the end would be a better solution I think. – SharpKnight Sep 16 '13 at 18:27
  • I tried that too and it didn't work 100%, but it's a lot closer than what I started with. See my edited question. – WOUNDEDStevenJones Sep 16 '13 at 18:34
0

Use the wordwrap php function.

something like this:

$result = wordwrap(strip_tags($fullText), 100, "...\n"); // Remove HTML and split
$result = explode("\n", $result);
$result = $result[0]; // Select the first group of 100 characters
  • 2
    Does this have anything to do with html special characters...? – WOUNDEDStevenJones Sep 16 '13 at 18:50
  • No, the wordwrap function only extract X characters from a given string using the space char as delimiter (always extract the exact word). – Alberto Rivas Sep 16 '13 at 18:55
  • @WOUNDEDStevenJones Actually, yes. Semantically HTML entities are words, so you either will have them in full or not have them at all. Though special handling should be done for texts without spaces. Also I would recommend to use `\0` instead of `\n`. – user Mar 18 '14 at 03:55
0

I tried

function truncateIfNecessary($string, $length) {
    if(strlen($string) > $length) {
        $string = html_entity_decode(strip_tags($string));
        $string = substr($string, 0, $length).'...';
        $string = htmlentities($string);
        return $string;
    } else {
        return strip_tags($string);
    }
}

but for some reason it missed a few – and •. For now, I found the solution at http://alanwhipple.com/2011/05/25/php-truncate-string-preserving-html-tags-words/ (linked at Shortening text tweet-like without cutting links inside) worked perfectly - handles htmltags, preserve whole words (or not), and htmlentities. Now it's just:

function truncateIfNecessary($string, $length) {
    if(strlen($string) > $length) {
        return truncateHtml($string, $length, "...", true, true);
    } else {
        return strip_tags($string);
    }
}
Community
  • 1
  • 1
WOUNDEDStevenJones
  • 5,150
  • 6
  • 41
  • 53
0
function _truncate($string,$lenMax = 100) {

    $len = strlen($string);
    if ($len > $lenMax - 1) {
        $string = substr(strip_tags($string),0,$lenMax);
        $string = substr($string,0,strrpos($string," ")).'...';
    }

    return $string;
}
Moon
  • 19
  • 3