Is using \n proper to detect new line feeds? I know that some systems use \n, other \r\n and others \r, but \n is the most common.
It depends where the data is coming from. Different operating systems have different line breaks.
Windows uses \r\n
, *nix (including mac OS) uses \n
, (very) old macs used \r
. If the data is coming from the web (e.g. a textarea) it will (/ should) always be \r\n
. Because that's what the spec states user agents should do.
Sometimes, if there's an HTML entity like " (quotation mark) at the end, it's left as ", and therefore it's not valid HTML. How can I prevent this?
Before cutting the text you may want to convert html entities back to normal text. By using either htmlspecialchars_decode()
or html_entity_decode
depending on your needs. Now you won't have the problem of breaking the entities (don't forget to encode it again if needed).
Another option would be to only break the text on whitespace characters rather than a hard character limit. This way you will only have whole words in your "summary".
I've created a class which should deal with most issues. As I already stated when the data is coming from a textarea it will always be \r\n
, but to be able to parse other linebreaks I came up with something like the following (untested):
class Preview
{
protected $maxCharacters;
protected $maxLines;
protected $encoding;
protected $lineBreaks;
public function __construct($maxCharacters = 500, $maxLines = 10, $encoding = 'UTF-8', array $lineBreaks = array("\r\n", "\r", "\n"))
{
$this->maxCharacters = $maxCharacters;
$this->maxLines = $maxLines;
$this->encoding = $encoding;
$this->lineBreaks = $lineBreaks;
}
public function makePreview($text)
{
$text = $this->normalizeLinebreaks($text);
// this prevents the breaking of the "e; etc
$text = html_entity_decode($text, ENT_QUOTES, $this->encoding);
$text = $this->limitLines($text);
if (mb_strlen($text, $this->encoding) > $this->maxCharacters) {
$text = $this->limitCharacters($text);
}
return html_entity_decode($text, ENT_QUOTES, $this->encoding);
}
protected function normalizeLinebreaks($text)
{
return str_replace($lineBreaks, "\n", $text);
}
protected function limitLines($text)
{
$lines = explode("\n", $text);
$limitedLines = array_slice($lines, 0, $this->maxLines);
return implode("\n", $limitedLines);
}
protected function limitCharacters($text)
{
return substr($text, 0, $this->maxCharacters);
}
}
$preview = new Preview();
echo $preview->makePreview('Some text which will be turned into a preview.');