3

I'm trying to convert my sample HTML output into a plain text but I don't know how. I use file_get_contents but the page which I'm trying to convert returns most like the same.

$raw = "http://localhost/guestbook/profiles.php";
$file_converted = file_get_contents($raw);
echo $file_converted;

profiles.php

<html>
    <head>
        <title>Profiles - GuestBook</title>
        <link rel="stylesheet" type="text/css" href="css/style.css">
    </head>
<body>
    <!-- Some Divs -->
    <div id="profile-wrapper">
        <h2>Profile</h2>
        <table>
            <tr>
                <td>Name:</td><td> John Dela Cruz</td>
            </tr>
            <tr>
                <td>Age:</td><td>15</td>
            </tr>
            <tr>
                <td>Location:</td><td> SomewhereIn, Asia</td>
            </tr>
        </table>
    </div>
</body>
</html>

Basically, I trying to echo out something like this (plain text, no styles)

Profile
Name: John Dela Cruz
Age: 15
Location: SomewhereIn, Asia

but i don't know how. :-( . Please help me guys , thank you in advance.

EDIT: Since i am only after of the content of the page, no matter if it's styled or just a plain text , is there a way to select only (see code below) using file_get_contents() ?

 <h2>Profile</h2>
        <table>
            <tr>
                <td>Name:</td><td> John Dela Cruz</td>
            </tr>
            <tr>
                <td>Age:</td><td>15</td>
            </tr>
            <tr>
                <td>Location:</td><td> SomewhereIn, Asia</td>
            </tr>
        </table>
Adrian Heine
  • 4,051
  • 2
  • 30
  • 43
Dan
  • 115
  • 2
  • 3
  • 14
  • thanks for the quick response Sergej Jevsejev, josnidhin, and Jonathan Rich. Much aprreciated. :-) – Dan Dec 06 '11 at 16:09
  • 1
    Note that people saying to use `strip_tags` Don't fully understand it or are being careless. It will leave your title intact, as well as any inline stylesheets or JavaScript. You don't have any of the latter, but you do have a title. . . – Levi Morrison Dec 06 '11 at 16:09
  • i use strip_tags, though it takes off html tags but it returns to an undesirable output e.g. { font: bold 11px Lucida Grande, Lucida Sans Unicode, Trebuchet MS, Helvetica, Arial, sans-serif; color: #045877; padding: 15px 0 0 12px; text-decoration: none; display: block; margin: 0 auto; } – Dan Dec 06 '11 at 16:14
  • maybe using strip_tags will surely answer the title of my question, as what i have seen, it's a plain text but using strip_tags doesn't help me to return desirable output. – Dan Dec 06 '11 at 16:42
  • use strip_tags or if you need plain text with html tags then: http://browse-tutorials.com/snippet/convert-text-or-html-plain-text-php – ram4nd Nov 25 '13 at 19:47

5 Answers5

6

Use php strip_tags

If strip_tags is not working for then maybe you can use regex to extract the info you want.

Try using PHP preg_match with /(<td>.*?<\/td>)/ as the pattern

Josnidhin
  • 12,469
  • 9
  • 42
  • 61
  • Exactly, and if you do not need white spaces or other symbols (chars), refer to [trim](http://php.net/manual/en/function.trim.php). – Rolice Dec 06 '11 at 16:09
  • 1
    This is not `exactly`. It will leave his title in the plain-text. – Levi Morrison Dec 06 '11 at 16:10
  • i use strip_tags, though it takes off html tags but it returns to an undesirable output e.g. { font: bold 11px Lucida Grande, Lucida Sans Unicode, Trebuchet MS, Helvetica, Arial, sans-serif; color: #045877; padding: 15px 0 0 12px; text-decoration: none; display: block; margin: 0 auto; } – Dan Dec 06 '11 at 16:15
2

Have a look at simplexml_load_file():

http://www.php.net/manual/en/function.simplexml-load-file.php

It will allow you to load the HTML data into an object (SimpleXMLElement) and traverse that object like a tree.

Jonathan Rich
  • 1,740
  • 10
  • 11
1

try to use PHP function strip_tags

Sergej Brazdeikis
  • 1,323
  • 10
  • 11
  • 1
    i use strip_tags, though it takes off html tags but it returns to an undesirable output e.g. { font: bold 11px Lucida Grande, Lucida Sans Unicode, Trebuchet MS, Helvetica, Arial, sans-serif; color: #045877; padding: 15px 0 0 12px; text-decoration: none; display: block; margin: 0 auto; } – Dan Dec 06 '11 at 16:15
1

try this one,

<?php
$data = file_get_contents("your_file");
preg_match_all('|<div[^>]*?>(.*?)</div>|si',$data, $result);
print_r($result[0][0]);
?>

I have try this one, and it seems work for me, for you too i hope

Khairu Aqsara
  • 1,321
  • 3
  • 14
  • 27
0

You can use the strip_tags php function for this. Browse through the comments in the php manual page of the strip_tags function to see how you can use this in a good way.

Iljaas
  • 520
  • 2
  • 12