0

I'm working in PHP and I want to create a function that, given a text of arbitrary length and height, returns a restricted version of the same text with a maximum of 500 characters and 10 lines.

This is what I have so far:

function preview($str)
{
    $partialPreview = explode("\n", substr($str, 0, 500));
    $partialPreviewHeight = count($partialPreview);
    $finalPreview = "";

    // if it has more than 10 lines
    if ($partialPreviewHeight > 10) {
        for ($i = 0; $i < 10; $i++) {
            $finalPreview .= $partialPreview[$i];
        }
    } else {
        $finalPreview = substr($str, 0, 500);
    }

    return $finalPreview;
}

I have two questions:

  • Is using \n proper to detect new line feeds? I know that some systems use \n, other \r\n and others \r, but \n is the most common.
  • Sometimes, if there's an HTML entity like &quot; (quotation mark) at the end, it's left as &quot, and therefore it's not valid HTML. How can I prevent this?
federico-t
  • 12,014
  • 19
  • 67
  • 111

2 Answers2

1

First replace <br /> tags with <br />\n and </p><p> or </div><div> with </p>\n<p> and </div>\n<div> respectively.

Then use the PHP function for strip tags which should yield a nice plain text with newlines in everyplace a newline should be.

Then you could replace \r\n with \n for consistency. And only after that you could extract the desired length of text.

You may want to use word wrapping to achieve your 10 line goal. For word wraps to work you need to define a number of characters per line and word wraps takes care of not braking mid-word.

You may want to use the html_entity_decode before using wordwrap as @PeeHaa suggested.

Mihai Stancu
  • 15,848
  • 2
  • 33
  • 51
0

Is using \n proper to detect new line feeds? I know that some systems use \n, other \r\n and others \r, but \n is the most common.

It depends where the data is coming from. Different operating systems have different line breaks.

Windows uses \r\n, *nix (including mac OS) uses \n, (very) old macs used \r. If the data is coming from the web (e.g. a textarea) it will (/ should) always be \r\n. Because that's what the spec states user agents should do.

Sometimes, if there's an HTML entity like " (quotation mark) at the end, it's left as &quot, and therefore it's not valid HTML. How can I prevent this?

Before cutting the text you may want to convert html entities back to normal text. By using either htmlspecialchars_decode() or html_entity_decode depending on your needs. Now you won't have the problem of breaking the entities (don't forget to encode it again if needed).

Another option would be to only break the text on whitespace characters rather than a hard character limit. This way you will only have whole words in your "summary".

I've created a class which should deal with most issues. As I already stated when the data is coming from a textarea it will always be \r\n, but to be able to parse other linebreaks I came up with something like the following (untested):

class Preview
{
    protected $maxCharacters;
    protected $maxLines;
    protected $encoding;
    protected $lineBreaks;

    public function __construct($maxCharacters = 500, $maxLines = 10, $encoding = 'UTF-8', array $lineBreaks = array("\r\n", "\r", "\n"))
    {
        $this->maxCharacters = $maxCharacters;
        $this->maxLines = $maxLines;
        $this->encoding = $encoding;
        $this->lineBreaks = $lineBreaks;
    }

    public function makePreview($text)
    {
        $text = $this->normalizeLinebreaks($text);

        // this prevents the breaking of the &quote; etc
        $text = html_entity_decode($text, ENT_QUOTES, $this->encoding);

        $text = $this->limitLines($text);

        if (mb_strlen($text, $this->encoding) > $this->maxCharacters) {
            $text = $this->limitCharacters($text);
        }

        return html_entity_decode($text, ENT_QUOTES, $this->encoding);
    }

    protected function normalizeLinebreaks($text)
    {
        return str_replace($lineBreaks, "\n", $text);
    }

    protected function limitLines($text)
    {
        $lines = explode("\n", $text);
        $limitedLines = array_slice($lines, 0, $this->maxLines);

        return implode("\n", $limitedLines);
    }

    protected function limitCharacters($text)
    {
        return substr($text, 0, $this->maxCharacters);
    }
}

$preview = new Preview();
echo $preview->makePreview('Some text which will be turned into a preview.');
Community
  • 1
  • 1
PeeHaa
  • 71,436
  • 58
  • 190
  • 262