1

I am using the following function in PHP to trim some unwanted characters.

$inputString = "आनन्द मठ";
trim(html_entity_decode($inputString), " \t\n\r\0\x0B\xC2\xA0");

The above code is working fine for all cases but in one input string (आनन्द मठ) it is converting it to आनन्द म�. It has a unwanted �. Also happening for परेटो- श्रेष्ठ converted to परेटो- श्रेष्�.

Harshit
  • 711
  • 1
  • 9
  • 29

2 Answers2

1
trim()

This function use iso-8859 encoding.

you must use UTF8 (Unicode) function. Try this function

function mb_trim($string, $charlist='\\\\s', $ltrim=true, $rtrim=true) 
{ 
    $both_ends = $ltrim && $rtrim; 

    $char_class_inner = preg_replace( 
        array( '/[\^\-\]\\\]/S', '/\\\{4}/S' ), 
        array( '\\\\\\0', '\\' ), 
        $charlist 
    ); 

    $work_horse = '[' . $char_class_inner . ']+'; 
    $ltrim && $left_pattern = '^' . $work_horse; 
    $rtrim && $right_pattern = $work_horse . '$'; 

    if($both_ends) 
    { 
        $pattern_middle = $left_pattern . '|' . $right_pattern; 
    } 
    elseif($ltrim) 
    { 
        $pattern_middle = $left_pattern; 
    } 
    else 
    { 
        $pattern_middle = $right_pattern; 
    } 

    return preg_replace("/$pattern_middle/usSD", '', $string) ); 
} 
le Mandarin
  • 192
  • 10
1

Add http header in your php like

header("Content-Type: text/html; charset=ISO-8859-1");

or put the encoding in a meta tag:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Shubham Dixit
  • 9,242
  • 4
  • 27
  • 46