3

I have this text : http://pastebin.com/2Zgbs7hi

And i want to be able to remove the HTML code from it and just display the plain text but i want to keep at least one line break where there are currently a few line breaks

i have tried:

$ticket["summary"] = 'pastebin example';

$TicketSummaryDisplay = nl2br($ticket["summary"]);
$TicketSummaryDisplay = stripslashes($TicketSummaryDisplay);
$TicketSummaryDisplay = trim(strip_tags($TicketSummaryDisplay));
$TicketSummaryDisplay = preg_replace('/\n\s+$/m', '', $TicketSummaryDisplay);
echo $TicketSummaryDisplay;

that is displaying as plain text, but it shows it all as one big block of text with no line breaks at all

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • I would suggest you to try some regex for this – TheMohanAhuja Mar 20 '14 at 15:04
  • 1
    @TheMohanAhuja: I would suggest you read http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Wooble Mar 20 '14 at 15:11
  • Or instead of Regex you could use an XML parser or XPath if you have valid (X)HTML. – Alex van den Hoogen Mar 20 '14 at 15:42
  • Do you have an example please –  Mar 20 '14 at 20:54
  • This question seems to be primarily about the regular expression for removing the line breaks. The usage of removing html tags is incidental to the problem stated **"it shows it all as one big block of text with no line breaks at all"**. What are you really looking for? Just the text within certain html tags? Or everything from the source that isn't markup? Try pasting in all or part of what your desired output is. – Patrick M Apr 03 '14 at 20:11
  • I'm looking to remove the HTML tags but keep the line breaks, the the line breaks not as big. It's code from emails that get inserted into a database, I want it to show as a normal email would in an email editor/viewer –  Apr 03 '14 at 21:02

9 Answers9

1

Maybe this will earn you some time.

<?php
libxml_use_internal_errors(true); //crazy o tags
$html = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$dom = new DOMDocument;
$dom->loadHTML($html);

$result='';
foreach ($dom->getElementsByTagName('p') as $node) {
    if (strstr($node->nodeValue, 'Legal Disclaimer:')){
        break;
    }
    $result .= $node->nodeValue;

}
echo $result;
Djonatan
  • 95
  • 11
0

This example should successfully store text from html into an array of strings.

After stripping all the tags, you can use preg_split with \R special character ( matches any newline sequence ) to convert string into array. That array will now have several blank values, and there will be also some amount of html non-breaking space entities, so we will check the array for empty values with array_filter() function ( it will remove all items that do not satisfy the filter conditions, in our case, an empty value ). Here are a problem with &nbsp; entity, because &nbsp; and space characters are not the same, they have different ASCII code, so trim() function will not remove &nbsp; spaces. Here are two possible solutions, the first uncommented part will only replace &nbsp and check for white space characters, while the second commented one will decode all html entities and also check for spaces.

PHP:

$text = file_get_contents( 'http://pastebin.com/raw.php?i=2Zgbs7hi' );
$text = strip_tags( $text );

$array = array_filter( 
    preg_split( '/\R/', $text ), 
    function( &$item ) {

        $item = str_replace( '&nbsp;', ' ', $item ); 
        return trim( $item );

        // $item = html_entity_decode( $item );     
        // return trim( str_replace( "\xC2\xA0", ' ', $item ) );

    }
);

foreach( $array as $value ) {
    echo $value . '<br />';
}

Array output:

Array
(
    [8] => Hi,
    [11] => Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
    [13] => Regards
    [23] => Legal Disclaimer:
    [24] => This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
    [25] =>  Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
)

Now you should have clear array with only items with value in it. By the way, newlines in HTML are expressed through <br />, not through \n, your example as response in a web browser still has them, but they are only visible in page source code. I hope I did not missed the point of the question.

Danijel
  • 12,408
  • 5
  • 38
  • 54
0

try this get text output with line brakes

<?php
$ticket["summary"]  = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');

$TicketSummaryDisplay = nl2br($ticket["summary"]);

echo strip_tags($TicketSummaryDisplay,'<br>');


?>
0

You are asking on how to add line-breaks to your "one big block of text with no line breaks at all".

Short answer

  • After you stripped the HTML tags, apply wordwrap with a desired text-block length
  • $text = wordwrap($text, 90, "<br />\n");
  • I really wonder, why nobody suggested that function before.
  • there is also chunk_split around, which doesn't take words into account and just splits after a certain number of chars. breaking words - but that's not what you want, i guess.

PHP

<?php
$text = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');

/**
 * Returns string without html tags, also
 * removes takes control chars, spaces and "&nbsp;" into account.
 */
function dropHtmlTags($string) {

    // remove html tags
    //$string = preg_replace ('/<[^>]*>/', ' ', $string);
    $string = strip_tags($string);

    // control characters and "&nbsp"
    $string = str_replace("\r", '', $string);    // remove
    $string = str_replace("\n", ' ', $string);   // replace with space
    $string = str_replace("\t", ' ', $string);   // replace with space
    $string = str_replace("&nbsp;", ' ', $string);

    // remove multiple spaces
    $string = preg_replace('/ {2,}/', ' ', $string);
    $string = trim($string);

    return $string;

}

$text = dropHtmlTags($text);

// The Answer: insert line breaks after 95 chars,
// to get rid of the "one big block of text with no line breaks at all"
$text = wordwrap($text, 95, "<br />\n");

// if you want to insert line-breaks before the legal disclaimer, 
// uncomment the next line
//$text = str_replace("Regards Legal Disclaimer", "<br /><br />Regards Legal Disclaimer", $text);

echo $text;
?>

Result

  • first section shows your text block
  • second section shows the text with wordwrap applied (code from above)

enter image description here

Jens A. Koch
  • 39,862
  • 13
  • 113
  • 141
0

Hello it can be done as follows:

$abc= file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');

$abc = strip_tags("\n", $abc);

 echo $abc;

Please, let me know whether it works

Yogesh Pawar
  • 336
  • 2
  • 17
0

you may use

<?php
$a= file_get_contents('a.txt');
echo nl2br(htmlspecialchars($a));
?>
-1
<?php

$handle = @fopen("pastebin.html", "r");
if ($handle) {
    while (!feof($handle)) {
        $buffer = fgetss($handle, 4096);
        echo $buffer;
    }
    fclose($handle);
}
?>

output is

Hi,

&nbsp;
Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
&nbsp;
Regards
&nbsp;


&nbsp;

&nbsp;

&nbsp;
&nbsp;
Legal Disclaimer:
This email and its attachments are confidential. If you received it by mistake, please don&#8217;t share it. Let us know and then delete it. Its content does not necessarily represent the views of&nbsp;The Dragon Enterprise
 Centre&nbsp;and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
&nbsp;

&nbsp;
&nbsp;
&nbsp;

You can probably write additional code to convert   to spaces etc.

crafter
  • 6,246
  • 1
  • 34
  • 46
  • Why the downvote. Did the code not get the required result. I'm confused about what is required here. OP said "And i want to be able to remove the HTML code from it and just display the plain text ..." – crafter Apr 03 '14 at 21:22
-1

I'm not sure I did understand everything correctly but this seems to be your expected result:

$txt  = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');

var_dump(preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", trim(strip_tags(preg_replace("/(\s){1,}/", " ", $txt)))));


//more readable

$txt = preg_replace("/(\s){1,}/", " ", $txt);
$txt = trim(strip_tags($txt));
$txt = preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", $txt);
ilpaijin
  • 3,645
  • 2
  • 23
  • 26
-1

The strip_tags() function strips HTML and PHP tags from a string, if that is what you are trying to accomplish.

Examples from the docs:

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

The above example will output:

Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>
Patrick M
  • 10,547
  • 9
  • 68
  • 101
money
  • 149
  • 5