0

I am trying to fetch the title and body of a url. For this purpose, I am using the Simple HTML DOM parser library.

This is my code:

<?php
    error_reporting(E_ALL);
    ini_set('display_errors', 1);
    header('Content-Type: text/plain; charset=utf-8');
    header('Access-Control-Allow-Origin: *');
    header('Access-Control-Allow-Methods: POST, GET, OPTIONS');
    
    include('simple_html_dom.php');

    if(isset($_POST["url"])){
        $html = file_get_html(dirname($_POST["url"])."/".urlencode(basename($_POST["url"])));
        echo $html->find('title',0)->plaintext;
    }

?>

I get the response like this:

"\u0938\u094d\u091f\u0947\u0936\u0928\u094b\u0902 \u0915\u0947 \u092c\u0940\u091a \u0928\u0935..."

How do I get the original string?

  • Can you provide example of original text – Jacob Mulquin Dec 03 '22 at 07:40
  • maybe you can look into? [mb_convert_encoding](https://www.php.net/manual/en/function.mb-convert-encoding.php) – magicianiam Dec 03 '22 at 07:54
  • If that is the response, it *is* the original string. Perhaps the library is broken you use, see [the reference question for your options here on SO on how to parse and process HTML with PHP](https://stackoverflow.com/q/3577641/367456). Otherwise including the double quotes it looks like JSON text, have you tried to decode it? – hakre Dec 03 '22 at 07:58
  • `codepoint_decode()` function from the following answer will work: https://stackoverflow.com/a/24763655/1427345 – Jacob Mulquin Dec 03 '22 at 08:54
  • The text is in `Hindi` language –  Dec 04 '22 at 02:49

0 Answers0