How to remove all non printable characters in a string?

Question

I imagine I need to remove chars 0-31 and 127.

Is there a function or piece of code to do this efficiently?

score 453 · Accepted Answer · edited May 23 '17 at 11:55

7 bit ASCII?

If your Tardis just landed in 1963, and you just want the 7 bit printable ASCII chars, you can rip out everything from 0-31 and 127-255 with this:

$string = preg_replace('/[\x00-\x1F\x7F-\xFF]/', '', $string);

It matches anything in range 0-31, 127-255 and removes it.

8 bit extended ASCII?

You fell into a Hot Tub Time Machine, and you're back in the eighties. If you've got some form of 8 bit ASCII, then you might want to keep the chars in range 128-255. An easy adjustment - just look for 0-31 and 127

$string = preg_replace('/[\x00-\x1F\x7F]/', '', $string);

UTF-8?

Ah, welcome back to the 21st century. If you have a UTF-8 encoded string, then the /u modifier can be used on the regex

$string = preg_replace('/[\x00-\x1F\x7F]/u', '', $string);

This just removes 0-31 and 127. This works in ASCII and UTF-8 because both share the same control set range (as noted by mgutt below). Strictly speaking, this would work without the /u modifier. But it makes life easier if you want to remove other chars...

If you're dealing with Unicode, there are potentially many non-printing elements, but let's consider a simple one: NO-BREAK SPACE (U+00A0)

In a UTF-8 string, this would be encoded as 0xC2A0. You could look for and remove that specific sequence, but with the /u modifier in place, you can simply add \xA0 to the character class:

$string = preg_replace('/[\x00-\x1F\x7F\xA0]/u', '', $string);

Addendum: What about str_replace?

preg_replace is pretty efficient, but if you're doing this operation a lot, you could build an array of chars you want to remove, and use str_replace as noted by mgutt below, e.g.

//build an array we can re-use across several operations
$badchar=array(
    // control characters
    chr(0), chr(1), chr(2), chr(3), chr(4), chr(5), chr(6), chr(7), chr(8), chr(9), chr(10),
    chr(11), chr(12), chr(13), chr(14), chr(15), chr(16), chr(17), chr(18), chr(19), chr(20),
    chr(21), chr(22), chr(23), chr(24), chr(25), chr(26), chr(27), chr(28), chr(29), chr(30),
    chr(31),
    // non-printing characters
    chr(127)
);

//replace the unwanted chars
$str2 = str_replace($badchar, '', $str);

Intuitively, this seems like it would be fast, but it's not always the case, you should definitely benchmark to see if it saves you anything. I did some benchmarks across a variety string lengths with random data, and this pattern emerged using php 7.0.12

     2 chars str_replace     5.3439ms preg_replace     2.9919ms preg_replace is 44.01% faster
     4 chars str_replace     6.0701ms preg_replace     1.4119ms preg_replace is 76.74% faster
     8 chars str_replace     5.8119ms preg_replace     2.0721ms preg_replace is 64.35% faster
    16 chars str_replace     6.0401ms preg_replace     2.1980ms preg_replace is 63.61% faster
    32 chars str_replace     6.0320ms preg_replace     2.6770ms preg_replace is 55.62% faster
    64 chars str_replace     7.4198ms preg_replace     4.4160ms preg_replace is 40.48% faster
   128 chars str_replace    12.7239ms preg_replace     7.5412ms preg_replace is 40.73% faster
   256 chars str_replace    19.8820ms preg_replace    17.1330ms preg_replace is 13.83% faster
   512 chars str_replace    34.3399ms preg_replace    34.0221ms preg_replace is  0.93% faster
  1024 chars str_replace    57.1141ms preg_replace    67.0300ms str_replace  is 14.79% faster
  2048 chars str_replace    94.7111ms preg_replace   123.3189ms str_replace  is 23.20% faster
  4096 chars str_replace   227.7029ms preg_replace   258.3771ms str_replace  is 11.87% faster
  8192 chars str_replace   506.3410ms preg_replace   555.6269ms str_replace  is  8.87% faster
 16384 chars str_replace  1116.8811ms preg_replace  1098.0589ms preg_replace is  1.69% faster
 32768 chars str_replace  2299.3128ms preg_replace  2222.8632ms preg_replace is  3.32% faster

The timings themselves are for 10000 iterations, but what's more interesting is the relative differences. Up to 512 chars, I was seeing preg_replace alway win. In the 1-8kb range, str_replace had a marginal edge.

I thought it was interesting result, so including it here. The important thing is not to take this result and use it to decide which method to use, but to benchmark against your own data and then decide.

If you need to consider a newline safe, change the expression to this (inversely search for printables): preg_replace(/[^\x0A\x20-\x7E]/,'',$string); — Nick, Sep 16 '10 at 19:56
@Dalin There is no such thing as an “UTF-8 character”. There are Unicode symbols/characters, and UTF-8 is an encoding that can represent all of them. You meant to say this doesn’t work for characters outside of the ASCII character set. — Mathias Bynens, Dec 31 '12 at 13:25
If you need to match a unicode character above \xFF, use \x{####} — Peter Olson, Jul 10 '13 at 05:09
Hi is there a way that it can preserve new lines? I'm using it and it actually deletes special characters from my string but my string is for example 20 lines. The output is now one-line (All 20 lines were combined). — kimbarcelona, Apr 08 '14 at 06:48
is an encoding, not a character. The solution above is only intended to work on ASCII characters. — Paul Dixon, May 21 '15 at 14:07
Sorry, but this answer is completely wrong, see mine: http://stackoverflow.com/a/42058165/318765 — mgutt, Feb 05 '17 at 22:42
@mgutt I've clarified the answer. See also interesting benchmark result on str_replace. — Paul Dixon, Feb 06 '17 at 11:01
Remove the deletion of 128-255. There does not exist something like a "7-bit extended ascii table". The only 128-255 control set I know is the one in UTF-8 and this should not be touched as it could contain (in Windows) the euro sign and other characters as stated in my answer. P.S I verified your benchmark. `preg_replace`is faster. — mgutt, Feb 06 '17 at 11:20
P.S. maybe you think about adding `chr(160) (NO-BREAK SPACE)` and `chr(173) (SOFT HYPHEN)`. They are non-printable, too. — mgutt, Feb 06 '17 at 11:33
`$string = preg_replace('/[\x00-\x1F\x7F\xA0]/u', '', $string);` worked perfectly to sanitize data from `iptcparse`, thank you! — Patrizio Bekerle, Feb 20 '17 at 06:46
Didn't work with LSEP, for example: `print(preg_replace('/[\x00-\x1F\x7F\xA0]/u', '', "  An"));` Dalin's answer worked with [:print:]... Stackoverflow is stripping LSEP it seems... — giorgio79, May 15 '18 at 13:09
I need to remove every space and nbsp (160), so this `$string = preg_replace('/\s+/u', '', $string);` is enough for me... — dmnc, May 06 '19 at 10:24
This fails on most whitespace, which renders as a space in browsers but gets stripped out entirely here. [**Demo of Code Above Stripping Out Whitespace**](https://ideone.com/VtBHr6) — HoldOffHunger, Dec 27 '21 at 16:08
If you want to check binary data, like a file, you have to remove the "u" modifier from the UTF-8 solution. The documentation says that simply nothing will be matched, but the function seems to return a completely empty result instead. If I'm not wrong, then use the modifier for UTF encoded texts and remove it if using with other binary data, like files. — StanE, Feb 21 '22 at 14:17

score 159 · Answer 2 · edited Sep 08 '22 at 17:27

159

Many of the other answers here do not take into account unicode characters (e.g. öäüßйȝîûηыეமிᚉ⠛ ). In this case you can use the following:

$string = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]/u', '', $string);

There's a strange class of characters in the range \x80-\x9F (Just above the 7-bit ASCII range of characters) that are technically control characters, but over time have been misused for printable characters. If you don't have any problems with these, then you can use:

$string = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/u', '', $string);

If you wish to also strip line feeds, carriage returns, tabs, non-breaking spaces, and soft-hyphens, you can use:

$string = preg_replace('/[\x00-\x1F\x7F-\xA0\xAD]/u', '', $string);

Note that you must use single quotes for the above examples.

If you wish to strip everything except basic printable ASCII characters (all the example characters above will be stripped) you can use:

$string = preg_replace('/[^[:print:]]/', '', $string);

For reference see http://www.fileformat.info/info/charset/UTF-8/list.htm

edited Sep 08 '22 at 17:27

rybo111

12,240
4
61
70

answered Nov 17 '11 at 17:50

Dalin

3,012
1
21
21

1

Your regexp handles UTF8 characters fine; but it strips non-UTF8 "special" characters; like ç, ü and ö. `'/[\x00-\x1F\x80-\xC0]/u'`leaves them intact; but also division (F7) and multiplication (D7) sign. – Hazard May 09 '12 at 11:11
1

@Hazar yes you are correct \x80-\xFF stripped out too much, but \x80-\xC0 is still too restrictive. This would miss other printable characters like ©£±. For reference see http://www.utf8-chartable.de/ – Dalin Feb 07 '13 at 19:46
The third example with :print: behaves differently on different machines. It worked on localhost, but didn't strip the same characters on our live server. The first example stripped regular numbers from my string on localhost. – Josh Ribakoff Oct 29 '13 at 17:55
@JoshRibakoff I don't see how [:print:] could show different results on different machines, it is a POSIX standard: http://en.wikipedia.org/wiki/Regular_expression#Character_classes also I don't see how the first example could strip regular numbers, you'll need to give more info. – Dalin Oct 30 '13 at 15:58
Not sure what more info to give. Perhaps some server setting differed such as the locale. I don't know where to start to debug it, I fixed it using a whitelist of allowed characters which was kind of a pain but ended up getting the job done. – Josh Ribakoff Oct 30 '13 at 19:12
@Dalin I noticed at least the second option doesn't work with double quotes, but it does with single quotes. Do you know why this is? – Tim Malone Jun 06 '16 at 05:45
1

@TimMalone because PHP will expand those character sequences: http://php.net/manual/en/language.types.string.php#language.types.string.syntax.double so the regex won't see the range that you're trying to tell it about. – Dalin Oct 20 '16 at 16:20
1

What about 7F? Should it not be `\x7F-\x9F`? – Bell Nov 23 '16 at 18:54
@Bell Good catch. Fixed. – Dalin Nov 28 '16 at 15:57
Why do you want to remove the euro sign `\x80` and all the other printable characters?! Look my answer: http://stackoverflow.com/a/42058165/318765 – mgutt Feb 05 '17 at 23:06
@mgutt [`\x80` isn't the euro in unicode](http://www.fileformat.info/info/unicode/char/0080/index.htm), [`\x20AC` is](http://www.fileformat.info/info/unicode/char/20ac/index.htm). [`\x80` is the euro in some other encodings](https://www.microsoft.com/typography/EuroSymbolFAQ.mspx) but in Unicode it's technically a control character. If you want to leave it in, go for it. – Dalin Feb 07 '17 at 03:26
@Dalin Read my answer. I provided sources were you can see the behaviour. It seems to be a backwards compatibility to CP-1252. Test it by yourself through entering `€ ` on this website: https://mothereff.in/html-entities I see three euro signs. – mgutt Feb 07 '17 at 06:43
1

I just tried a lot, i tried every encoding function available in PHP from regex to mb_ to htmlspecialchars etc. Nothing removed control characters, thanks for investing the work. – John Jan 06 '18 at 03:27
:print is too restrictive and will loose euro and pound signs among other things. Just use `/[\x00-\x1F\x80-\xC0]/u` – user1432181 Jan 06 '22 at 16:23

Kevin Nelson · Answer 3 · 2015-03-10T18:27:36.970

Starting with PHP 5.2, we also have access to filter_var, which I have not seen any mention of so thought I'd throw it out there. To use filter_var to strip non-printable characters < 32 and > 127, you can do:

Filter ASCII characters below 32

$string = filter_var($input, FILTER_UNSAFE_RAW, FILTER_FLAG_STRIP_LOW);

Filter ASCII characters above 127

$string = filter_var($input, FILTER_UNSAFE_RAW, FILTER_FLAG_STRIP_HIGH);

Strip both:

$string = filter_var($input, FILTER_UNSAFE_RAW, FILTER_FLAG_STRIP_LOW|FILTER_FLAG_STRIP_HIGH);

You can also html-encode low characters (newline, tab, etc.) while stripping high:

$string = filter_var($input, FILTER_UNSAFE_RAW, FILTER_FLAG_ENCODE_LOW|FILTER_FLAG_STRIP_HIGH);

There are also options for stripping HTML, sanitizing e-mails and URLs, etc. So, lots of options for sanitization (strip out data) and even validation (return false if not valid rather than silently stripping).

Sanitization: http://php.net/manual/en/filter.filters.sanitize.php

Validation: http://php.net/manual/en/filter.filters.validate.php

However, there is still the problem, that the FILTER_FLAG_STRIP_LOW will strip out newline and carriage returns, which for a textarea are completely valid characters...so some of the Regex answers, I guess, are still necessary at times, e.g. after reviewing this thread, I plan to do this for textareas:

$string = preg_replace( '/[^[:print:]\r\n]/', '',$input);

This seems more readable than a number of the regexes that stripped out by numeric range.

Other answers didn't work for me, the "filter_var()" solution did it perfectly. Thanks after 7 years! :) — VG-Electronics, Nov 25 '22 at 09:45

score 27 · Answer 4 · answered Jul 24 '09 at 10:57

27

you can use character classes

/[[:cntrl:]]+/

answered Jul 24 '09 at 10:57

ghostdog74

327,991
56
259
343

doesn't this require me to use ereg though? – Stewart Robinson Jul 24 '09 at 11:05

score 23 · Answer 5 · edited May 23 '17 at 11:47

23

All of the solutions work partially, and even below probably does not cover all of the cases. My issue was in trying to insert a string into a utf8 mysql table. The string (and its bytes) all conformed to utf8, but had several bad sequences. I assume that most of them were control or formatting.

function clean_string($string) {
  $s = trim($string);
  $s = iconv("UTF-8", "UTF-8//IGNORE", $s); // drop all non utf-8 characters

  // this is some bad utf-8 byte sequence that makes mysql complain - control and formatting i think
  $s = preg_replace('/(?>[\x00-\x1F]|\xC2[\x80-\x9F]|\xE2[\x80-\x8F]{2}|\xE2\x80[\xA4-\xA8]|\xE2\x81[\x9F-\xAF])/', ' ', $s);

  $s = preg_replace('/\s+/', ' ', $s); // reduce all multiple whitespace to a single space

  return $s;
}

To further exacerbate the problem is the table vs. server vs. connection vs. rendering of the content, as talked about a little here

edited May 23 '17 at 11:47

Community

1
1

answered Dec 24 '13 at 20:52

Wayne Weibel

933
1
14
22

2

The only one that passes all my unit tests, awesome! – Korri Apr 08 '16 at 22:07
\xE2\x80[\xA4-\xA8] (or 226.128.[164-168]) - is wrong, the sequence include next printable symbols: Unicode Character 'ONE DOT LEADER' (U+2024), Unicode Character 'TWO DOT LEADER' (U+2025), Unicode Character 'HORIZONTAL ELLIPSIS' (U+2026), Unicode Character 'HYPHENATION POINT' (U+2027). And only one non-printable: Unicode Character 'LINE SEPARATOR' (U+2028). Next one is non-printable too: Unicode Character 'PARAGRAPH SEPARATOR' (U+2029). So replace the sequence with: \xE2\x80[\xA8-\xA9] \xE2\x80[\xA8-\xA9] to remove LINE SEPARATOR and PARAGRAPH SEPARATOR. – MingalevME Mar 07 '18 at 09:14
2

This is the best solution I could find so far, but I laso had to add `$s = preg_replace('/(\xF0\x9F[\x00-\xFF][\x00-\xFF])/', ' ', $s);` because of all the emoji characters were messing up mysql – Joe Black May 18 '19 at 10:35
Unfortunately the "bad utf-8" Regex above also removes line breaks! – Avatar Mar 10 '22 at 13:52

jacktrade · Answer 6 · 2017-02-23T09:14:50.923

17

this is simpler:

$string = preg_replace( '/[^[:cntrl:]]/', '',$string);

edited Feb 23 '17 at 09:14

answered Apr 20 '11 at 09:40

jacktrade

3,125
2
36
50

5

This also strips line feeds, carriage returns, and UTF8 characters. – Dalin Dec 17 '11 at 19:26
6

@Dalin There is no such thing as an “UTF-8 character”. There are Unicode symbols/characters, and UTF-8 is an encoding that can represent all of them. You meant to say this strips _characters outside of the ASCII range_ as well. – Mathias Bynens Dec 31 '12 at 13:36
2

Eats up Arabic characters :) – Rolf Jun 26 '13 at 15:56

score 14 · Answer 7 · answered Jan 07 '19 at 09:56

14

To strip all non-ASCII characters from the input string

$result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);

That code removes any characters in the hex ranges 0-31 and 128-255, leaving only the hex characters 32-127 in the resulting string, which I call $result in this example.

answered Jan 07 '19 at 09:56

Junaid Masood

658
11
20

Why would I want 127, which is DEL ? Wouldn't it be better as `[\x00-\x1F\x7F-\xFF]` to remove 127 to 255 instead of 128 to 255 ? – Volomike Feb 09 '22 at 06:06

cedivad · Answer 8 · 2022-01-25T17:13:41.947

13

For UTF-8, try this:

preg_replace('/[^\p{L}\s]/u','', $string);

That was my original answer form 10 years ago, and as the comments are saying this is well suited for feeding a full text search engine, as it removes some non-text printable characters like []!~ etc.

If you also need to remove invalid characters for say, feeding libexpat (sigh.), you can try:

preg_replace('/[^\PCc^\PCn^\PCs]/u', '', $string);

See this answer for more on the method.

edited Jan 25 '22 at 17:13

answered May 06 '12 at 12:56

cedivad

2,544
6
32
41

10

This well remove characters like quotes, brackets, etc. Those are certainly printable characters. – Gajus Jan 27 '14 at 21:37
this is wonderful! it saved my life, messed up while printing Arabic characters, worked like champ :) – krishna May 26 '16 at 14:33
This can be useful when only pure words are needed. For example, for a search engine on the page and an index in the database. Parentheses, periods and commas are then unnecessary. – Robert Oct 15 '21 at 09:14

score 10 · Answer 9 · answered Jul 24 '09 at 10:50

You could use a regular express to remove everything apart from those characters you wish to keep:

$string=preg_replace('/[^A-Za-z0-9 _\-\+\&]/','',$string);

Replaces everything that is not (^) the letters A-Z or a-z, the numbers 0-9, space, underscore, hypen, plus and ampersand - with nothing (i.e. remove it).

score 6 · Answer 10 · answered Mar 01 '13 at 11:06

6

preg_replace('/(?!\n)[\p{Cc}]/', '', $response);

This will remove all the control characters (http://uk.php.net/manual/en/regexp.reference.unicode.php) leaving the \n newline characters. From my experience, the control characters are the ones that most often cause the printing issues.

answered Mar 01 '13 at 11:06

Gajus

69,002
70
275
438

1

It works perfect for me! I added just `/u` for UTF-8 chars. Could you please explain what the first part `(?!\n)` does? – Marcio Mazzucato May 15 '17 at 19:53
Perfect ! I was looking for a way to remove unicode 'useless' characters and keep the important one (letters including accent, numbers, special chars) . Thanks for the answer and the documentation link – azerto00 Oct 26 '20 at 15:37

score 5 · Answer 11 · edited May 23 '17 at 12:10

The answer of @PaulDixon ~~is completely wrong, because it removes the printable extended ASCII characters 128-255!~~ has been partially corrected. I don't know why he still wants to delete 128-255 from a 127 chars 7-bit ASCII set as it does not have the extended ASCII characters.

But finally it was important not to delete 128-255 because for example chr(128) (\x80) is the euro sign in 8-bit ASCII and many UTF-8 fonts in Windows display a euro sign and Android regarding my own test.

And it will kill many UTF-8 characters if you remove the ASCII chars 128-255 from an UTF-8 string (probably the starting bytes of a multi-byte UTF-8 character). So don't do that! They are completely legal characters in all currently used file systems. The only reserved range is 0-31.

Instead use this to delete the non-printable characters 0-31 and 127:

$string = preg_replace('/[\x00-\x1F\x7F]/', '', $string);

It works in ASCII and UTF-8 because both share the same control set range.

The ~~fastest~~ slower¹ alternative without using regular expressions:

$string = str_replace(array(
    // control characters
    chr(0), chr(1), chr(2), chr(3), chr(4), chr(5), chr(6), chr(7), chr(8), chr(9), chr(10),
    chr(11), chr(12), chr(13), chr(14), chr(15), chr(16), chr(17), chr(18), chr(19), chr(20),
    chr(21), chr(22), chr(23), chr(24), chr(25), chr(26), chr(27), chr(28), chr(29), chr(30),
    chr(31),
    // non-printing characters
    chr(127)
), '', $string);

If you want to keep all whitespace characters \t, \n and \r, then remove chr(9), chr(10) and chr(13) from this list. Note: The usual whitespace is chr(32) so it stays in the result. Decide yourself if you want to remove non-breaking space chr(160) as it can cause problems.

¹ Tested by @PaulDixon and verified by myself.

score 2 · Answer 12 · edited Apr 08 '15 at 18:17

2

how about:

return preg_replace("/[^a-zA-Z0-9`_.,;@#%~'\"\+\*\?\[\^\]\$\(\)\{\}\=\!\<\>\|\:\-\s\\\\]+/", "", $data);

gives me complete control of what I want to include

edited Apr 08 '15 at 18:17

George Brighton

5,131
9
27
36

answered Apr 11 '14 at 04:05

sdfor

6,324
13
51
61

score 2 · Answer 13 · answered Jun 12 '20 at 10:51

The regex into selected answer fail for Unicode: 0x1d (with php 7.4)

a solution:

<?php
        $ct = 'différents'."\r\n test";

        // fail for Unicode: 0x1d
        $ct = preg_replace('/[\x00-\x1F\x7F]$/u', '',$ct);

        // work for Unicode: 0x1d
        $ct =  preg_replace( '/[^\P{C}]+/u', "",  $ct);

        // work for Unicode: 0x1d and allow line break
        $ct =  preg_replace( '/[^\P{C}\n]+/u', "",  $ct);

        echo $ct;

from: UTF 8 String remove all invisible characters except newline

score 1 · Answer 14 · answered Dec 28 '17 at 18:22

For anyone that is still looking how to do this without removing the non-printable characters, but rather escaping them, I made this to help out. Feel free to improve it! Characters are escaped to \\x[A-F0-9][A-F0-9].

Call like so:

$escaped = EscapeNonASCII($string);

$unescaped = UnescapeNonASCII($string);

<?php 
  function EscapeNonASCII($string) //Convert string to hex, replace non-printable chars with escaped hex
    {
        $hexbytes = strtoupper(bin2hex($string));
        $i = 0;
        while ($i < strlen($hexbytes))
        {
            $hexpair = substr($hexbytes, $i, 2);
            $decimal = hexdec($hexpair);
            if ($decimal < 32 || $decimal > 126)
            {
                $top = substr($hexbytes, 0, $i);
                $escaped = EscapeHex($hexpair);
                $bottom = substr($hexbytes, $i + 2);
                $hexbytes = $top . $escaped . $bottom;
                $i += 8;
            }
            $i += 2;
        }
        $string = hex2bin($hexbytes);
        return $string;
    }
    function EscapeHex($string) //Helper function for EscapeNonASCII()
    {
        $x = "5C5C78"; //\x
        $topnibble = bin2hex($string[0]); //Convert top nibble to hex
        $bottomnibble = bin2hex($string[1]); //Convert bottom nibble to hex
        $escaped = $x . $topnibble . $bottomnibble; //Concatenate escape sequence "\x" with top and bottom nibble
        return $escaped;
    }

    function UnescapeNonASCII($string) //Convert string to hex, replace escaped hex with actual hex.
    {
        $stringtohex = bin2hex($string);
        $stringtohex = preg_replace_callback('/5c5c78([a-fA-F0-9]{4})/', function ($m) { 
            return hex2bin($m[1]);
        }, $stringtohex);
        return hex2bin(strtoupper($stringtohex));
    }
?>

score 0 · Answer 15 · answered Aug 08 '13 at 03:54

0

Marked anwser is perfect but it misses character 127(DEL) which is also a non-printable character

my answer would be

$string = preg_replace('/[\x00-\x1F\x7f-\xFF]/', '', $string);

answered Aug 08 '13 at 03:54

Mubashar

12,300
11
66
95

This answer is wrong, too. See: http://stackoverflow.com/a/42058165/318765 – mgutt Feb 06 '17 at 07:31
above answer was a compliment to original answer which only adds up "delete" character. – Mubashar Mar 16 '20 at 00:54

score 0 · Answer 16 · answered Mar 14 '15 at 12:07

0

"cedivad" solved the issue for me with persistent result of Swedish chars ÅÄÖ.

$text = preg_replace( '/[^\p{L}\s]/u', '', $text );

Thanks!

answered Mar 14 '15 at 12:07

Andreas Ek

159
4

score -1 · Answer 17 · answered Jul 03 '18 at 08:55

-1

I solved problem for UTF8 using https://github.com/neitanod/forceutf8

use ForceUTF8\Encoding;

$string = Encoding::fixUTF8($string);

answered Jul 03 '18 at 08:55

Nick

9,735
7
59
89

2

This lib converts UTF-8 accented characters and UTF-8 emoticons to "?" symbols. Fairly serious issue unfortunately. – ChristoKiwi Sep 17 '18 at 23:15

How to remove all non printable characters in a string?

17 Answers17

7 bit ASCII?

8 bit extended ASCII?

UTF-8?

Addendum: What about str_replace?

Linked

Related