0

I have some HTML generated by a WYSIWYG-editor (WordPress).
I'd like to show a preview of this HTML, by only showing up to 3 lines of text (in HTML format).

Example HTML: (always formated with new lines)

<p>Hello, this is some generated HTML.</p>
<ol>
    <li>Some list item<li>
    <li>Some list item</li>
    <li>Some list item</li>
</ol>

I'd like to preview a maximum of 4 lines of text in this formated HTML.

Example preview to display: (numbers represent line numbers, not actual output).

  1. Hello, this is some generated HTML.
  2. Some list item
  3. Some list item

Would this be possible with Regex, or is there any other method that I could use?
I know this would be possible with JavaScript in a 'hacky' way, as questioned and answered on this post.
But I'd like to do this purely on the server-side (with PHP), possibly with SimpleXML?

Daan
  • 2,680
  • 20
  • 39
  • 2
    [Here are some answers](https://stackoverflow.com/questions/22391638/how-to-count-number-of-lines-from-html-out-in-php) for a .html file [And here](https://stackoverflow.com/questions/2162497/efficiently-counting-the-number-of-lines-of-a-text-file-200mb) for a text file – Alexis Dalai Waldo Jiménez Dec 08 '20 at 18:59
  • Thanks for the links, I checked them out. I think my question is different than the earlier asked HTML one, because it counts every HTML tag as a line. I don't want `inline` element tags to represent a line. And "
      " needs to be counted as a line in my case aswell.
    – Daan Dec 08 '20 at 19:15
  • 1
    There are no an easy way to accomplish this. You need it only for presentation reasons? – Jorge Miguel Sanchez Dec 08 '20 at 19:38
  • Yes, I want to have a consistent amount of lines to preview. This would help users to read part of the text before clicking on the link. For example while displaying search results, you will want to only show 3 lines in the preview. @JorgeMiguelSanchez – Daan Dec 08 '20 at 19:45
  • Trying to find a PHP/server-side solution does not appear to make much sense to me here to begin with, at least not given the parameters/requirements given so far. You won’t know how wide the text will go in the first place, if you know nothing about the client. What if the text inside one of those LI was not just “Some list item”, but 15 or fifty times that? Surely that single LI would break into several lines on the client at some point already … – CBroe Dec 09 '20 at 09:15

2 Answers2

1

It's really easy with XPath:

$string = '<p>Hello, this is some generated HTML.</p>
    <ol>
        <li>Some list item</li>
        <li>Some list item</li>
        <li>Some list item</li>
    </ol>';

// Convert to SimpleXML object
// A root element is required so we can just blindly add this
// or else SimpleXMLElement will complain
$xml = new SimpleXMLElement('<root>'.$string.'</root>');

// Get all the text() nodes
// I believe there is a way to select non-empty nodes here but we'll leave that logic for PHP
$result = $xml->xpath('//text()');

// Loop the nodes and display 4 non-empty text nodes
$i = 0;
foreach( $result as $key => $node )
{
    if(trim($node) !== '')
    {
        echo ++$i.'. '.htmlentities(trim($node)).'<br />'.PHP_EOL;
        if($i === 4)
        {
            break;
        }
    }
}

Output:

1. Hello, this is some generated HTML.<br />
2. Some list item<br />
3. Some list item<br />
4. Some list item<br />
MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
0

I have personally coded the following function, which isn't perfect, but works fine for me.

function returnHtmlLines($html, $amountOfLines = 4) {
    $lines_arr = array_values(array_filter(preg_split('/\n|\r/', $html)));

    $linesToReturn = array_slice($lines_arr, 0, $amountOfLines);

    return preg_replace('/\s{2,}/m', '', implode('', $linesToReturn));
}

Which returns the following HTML when using echo:

<p>Hello, this is some generated HTML.</p><ol><li>Some list item<li><li>Some list item</li>

Or formatted:

<p>Hello, this is some generated HTML.</p>
<ol>
    <li>Some list item<li>
    <li>Some list item</li>

Browsers will automatically close the <ol> tag, so it works fine for my needs.

Here is a Sandbox example

Daan
  • 2,680
  • 20
  • 39