-3

How can I remove these unwanted characters like �������?

I have already set the character encoding to utf-8, but still these characters are appearing.

If a person copy a text from word and pasted on the TinyMCE the unwanted chars does not appears before saving it on the db. When saved and fetch from the db the the unwanted chars appear.

Heres my current code for filtering:

$content = htmlentities(@iconv("UTF-8", "ISO-8859-1//IGNORE", $content));

Using this is good but the things is some of the unwanted chars are not fully filtered.

hakre
  • 193,403
  • 52
  • 435
  • 836
naviciroel
  • 422
  • 4
  • 19
  • 3
    Where did you get the characters from? What was the encoding of the source data? – Wooble Mar 13 '12 at 11:50
  • Are you sure PHP is outputting these characters? It may be that your browser is displaying them wrong. Can you post the code that generates them? – vascowhite Mar 13 '12 at 12:00
  • Needs more context. Show us a hex dump of the unprocessed input. – Deestan Mar 13 '12 at 12:09
  • Im using TinyMCE and from word i paste it to TinyMCE form... and when saved those chars are appearing.. I already tried str_replace but still its lacking there are so many chars that to trap – naviciroel Mar 13 '12 at 12:44
  • You probably tell your database that the string you send into it is not UTF-8 encoded OR your database is not able to store it. What is the original string you pasted into TinyMCE? What is the [hexdump of the string](http://stackoverflow.com/questions/1057572/how-can-i-get-a-hex-dump-of-a-string-in-php) with the question marks? – hakre Mar 13 '12 at 13:01
  • also tried this $text = iconv("UTF-8","UTF-8//IGNORE",$text); – naviciroel Mar 13 '12 at 13:07

4 Answers4

1

You can remove these characters by simply not outputting them - yes that works.

If you need a more specific guideline, well then you need to be more specific with your question. You only shared so far some information:

I have already set the character encoding to utf-8

It's missing to what that character encoding applies. Is it the output? Is it the string itself (there must be some string somewhere)? Is it the input?

You need to a) share your code to make clear what is causing this and b) share the encoding of any string that is related to your code.

hakre
  • 193,403
  • 52
  • 435
  • 836
0

Why don't you just work backwards? Remove all "non word" characters with this regex:

$cleanStr = preg_replace('/\W/', '', $yourInput);

Alternatively, you could be more precise with '/[^a-zA-Z0-9_]/', but /W represents that block.

dcbarans
  • 482
  • 7
  • 14
0

Here's a bunch of ways to clean unwanted characters I've used throughout the past. (keep in mind I do mysql_real_escape_string when doing mysql stuff.

//////////////////////////////////////////////////////////////////////////////////
// FUNCTION:     cleaner
// DESCRIPTION: Used mainly to clean large chunks of copy and pasted copy from 
//              word and on macs
//////////////////////////////////////////////////////////////////////////////////
function cleaner($some_var){
  $find[]    = '“';  // left side double smart quote
  $find[]    = 'â€';   // right side double smart quote
  $find[]    = '‘';   // left side single smart quote
  $find[]    = '’';  // right side single smart quote
  $find[]    = '…';  // elipsis
  $find[]    = 'â€"';  // em dash
  $find[]    = 'â€"';  // en dash
  $replace[] = '"';
  $replace[] = '"';
  $replace[] = "'";
  $replace[] = "'";
  $replace[] = "...";
  $replace[] = "-";
  $replace[] = "-";

  return(str_replace($find, $replace, trim($some_var)));
} 

//////////////////////////////////////////////////////////////////////////////////
// FUNCTION:     strip_accents
// DESCRIPTION: Used to replace all characters shown below
//////////////////////////////////////////////////////////////////////////////////
function strip_accents($some_var){ 
  return strtr($some_var, 'àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ','aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY'); 
}

//////////////////////////////////////////////////////////////////////////////////
// FUNCTION:     clean_text
// DESCRIPTION: Used to replace all characters but the below
//////////////////////////////////////////////////////////////////////////////////
function clean_text($some_var){
  $new_string = ereg_replace("[^A-Za-z0-9:/.' @-]", "", strip_accents(trim($some_var))); 
  return $new_string;
}

//////////////////////////////////////////////////////////////////////////////////
// FUNCTION:     clean_url
// DESCRIPTION: Strips all non alpha-numeric values from a field and formats the 
//              variable into a URL friendly variable
//////////////////////////////////////////////////////////////////////////////////
function clean_url($var){
    $find[]    = " ";
    $find[]    = "&";
    $replace[] = "-";
    $replace[] = "-and-";

  $new_string = preg_replace("/[^a-zA-Z0-9\-s]/", "", str_replace($find, $replace, strtolower(strip_accents(trim($var)))));
  return($new_string);
}

//////////////////////////////////////////////////////////////////////////////////
// FUNCTION:     post_cleaner
// DESCRIPTION: Another scrubber to remove tags and clean post data
//////////////////////////////////////////////////////////////////////////////////
function post_cleaner($var, $max = 75, $case="default"){
  switch($case):
    case "email":
      break;

    case "money":
      $var = ereg_replace("[^0-9. -]", "", strip_accents(trim($var))); 
      break;

    case "number":
      $var = ereg_replace("[^0-9. -]", "", strip_accents(trim($var))); 
      break;

    case "name":
      $var = ereg_replace("[^A-Za-z0-9/.' @-]", "", strip_accents(trim($var))); 
      $var = ucwords($var); 
      break;

    default:
      // $var = trim($var);
      // $var = htmlspecialchars($var);
      // $var = mysql_real_escape_string($var);
      // $var = substr($var, 0, $max);
      $var = substr(clean_text($var), 0, $max);
  endswitch;

  return $var;
}

This is just a few of many ways to clean text. Take what you want from it. Hope it helps.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
maximo
  • 1,101
  • 9
  • 16
-2

maybe with str_replace()? I can't see the chars you're using.

$badChars = array('$', '@', '~', 'R', '¬');

str_replace($badChars, '', $string);
SuperSpy
  • 1,324
  • 3
  • 13
  • 28