0

I'm writing a blog and I need a function that shows the excerpt of the post. I'm now using substring checking if the text is longer than 503 chars.

But this cuts my text often in the middle of a word AND in the middle of an HTML tag so I get the rest of the page to be like the tag half written.

I.e: "text text text <strong>Another piece of te [...] and the rest of the page is strong till it hits a new strong-end tag.

I tried removing some elements from the post but un-formats my text.

How do I go about in order to say "ok, the text is 980 chars, cut it at 503+whatever else is needed to get to the last dot (.) or complete the tag.

Follows my current code:

<?php
  $testo_preview = preg_replace("/<img[^>]+\>/i", ' ', $valore->testo);
  $testo_preview = preg_replace("/<a[^>]+>/i", '<a>', $testo_preview);
  $testo_preview = preg_replace("/<span[^>]+>/i", '<span>', $testo_preview);
  $testo_preview = preg_replace("/<div[^>]+>/i", '', $testo_preview);
  $testo_preview = str_replace("</div>", '', $testo_preview);
  $testo_preview = str_replace("\n", '<br>', $testo_preview); 
?>

<?php if(strlen($testo_preview) >= 503): ?>

   <?= substr($testo_preview, 0, 503).' [...]' ?>

<?php else: ?>

   <?= $testo_preview; ?>

<?php endif; ?>

Edit:

I found Pawel answer to be working ok, as it actually "gets to the point"...

I had to change the new DOMDocument() part as it was messing up the html accents (in italian we use some accents and I needed them to stay). I turn it into a function by taking part of the code from Tigger, therefore I upvoted both of you. I came up with an easy function:

function cleanCut($cutAt, $str){
        $next_dot = strpos($str, '.', $cutAt);
        if ($next_dot !== false){
            // text after default cutoff contains a dot so we need to extend the cutoff
            $preview_text = substr($str, 0, $next_dot + 1);
            // HTML Cleanup
            $preview_text = strip_tags($preview_text);
            $preview_text = str_replace("\n", '<br>', $preview_text);           
        } else {
            $preview_text = $str;
        }

        return $preview_text;
    }

It works fine and good. Only sometimes doesn't get to the point (when there is a long link) but it can be fine. Now as you see from the function I tried to replace \n with <br> as is the only tag I actually want, but it doesn't work. Any idea on why?

Mr.Web
  • 6,992
  • 8
  • 51
  • 86
  • http://stackoverflow.com/a/1732454/1180785 – Dave Aug 08 '13 at 01:08
  • 1
    After edit: Change `strip_tags($preview_text)` to `strip_tags($preview_text,'
    ')` and remove the next line. See the [PHP manual](http://php.net/strip_tags) for more info about `strip_tags()`
    – Tigger Aug 08 '13 at 10:09

2 Answers2

1

If I'm not wrong you can just ignore the tags for a moment. Find the last period you need and then cleanup the open tags. So one approach would be to: 1. Find the position of the dot after 503 characters. If none is found you show the whole text otherwise substring to that point. We will use offset to strpos. 2. Cleanup the HTML to close any open tags. 3. Since DOMDocument outputs full html document we need to strip the excess.

Example:

$max_length = 16;
$full_text  = "<b>Lorem ****. Impsum ****. That's already too long.</b>";
$next_dot   = strpos($full_text, '.', $max_length);

if ($next_dot !== false)
{
    // text after default cutoff contains a dot so we need to extend the cutoff
    $preview_text = substr($full_text, 0, $next_dot + 1); +1 so that the last dot is in
    // HTML Cleanup
    $doc = new DOMDocument();
    $doc->loadHTML("$preview_text");
    $preview_text = $doc->saveHTML();
    $preview_text = preg_replace('/(.*)<body>|(<\/body>.*)/ism', '', $preview_text);
} else {
    $preview_text = $full_text;
}

echo $preview_text;

It is a bit naive and there is few obvious problems with it but a. it will suffice or b. you'll be able to improve it on your own. Oh and then c. you ask more questions :)

Pawel
  • 118
  • 1
  • 1
  • 6
1

This function will cut a string cleanly at a certain point or just after it and remove all HTML tags as well. The &#8230; is the HTML code for '...' as a single character.

// strips HTML tags and return a clean word cut at a certain point
// or just after it.
function cleanCut($cutAt, $str) {
    $tmp = strip_tags($str);
    $tmp = explode(' ',$tmp);
    foreach($tmp as $k => $v) {
        $cleanStr .= $v.' ';
        if (strlen($cleanStr) >= $cutAt) {
            return trim($cleanStr).'&#8230;';
        }
    }
    // and it case it is a short string
    return $cleanStr;
}
Tigger
  • 8,980
  • 5
  • 36
  • 40