0

Here is my code in php

<?php
$output = "Clean this copy of invalid non ASCII ¥ äócharacters.";
$output = preg_replace('/[^(\x20-\x7F\p{Sc})]/','',$output);    
echo($output);
?>

Here I want to keep any currency symbols as it is and need to remove junk characters .

What changes should I make in regex for this issue ?

Thanks in advance

akirk
  • 6,757
  • 2
  • 34
  • 57

3 Answers3

1

You need to add the u modifier to the RegEx and it works nicely:

$output = 'Clean this copy of invalid non ASCII ¥$€ äócharacters';
$output = preg_replace('/[^(\x20-\x7F\p{Sc})]/u','',$output);    

Outputs

Clean this copy of invalid non ASCII ¥$€ characters.
akirk
  • 6,757
  • 2
  • 34
  • 57
  • I got this output : Clean this copy of invalid non ASCII Â¥ characters. It contains the  character too.. – Priya Bhojani Mar 12 '14 at 09:12
  • You need to be aware that the output will be UTF-8, too. Where do you look at the results? When through a browser you need to tell it that you are outputting UTF-8, for example by adding a `header("Content-type: text/html;charset=utf-8");` right at the start of the script. – akirk Mar 12 '14 at 09:14
  • @akirk : PHP Version 5.3.10-1ubuntu3.8 – Priya Bhojani Mar 12 '14 at 09:16
  • see how it works here: https://eval.in/118611 even without the charset: https://eval.in/118612 – akirk Mar 12 '14 at 09:17
  • @akirk : so won't it work with PHP Version 5.3.10-1ubuntu3.8 ? – Priya Bhojani Mar 12 '14 at 09:18
  • @Priya Perhaps you're just dealing with an encoding problem!? – deceze Mar 12 '14 at 09:19
  • @deceze : so how should I proceed further ? please help. – Priya Bhojani Mar 12 '14 at 09:21
  • PriyaBhojani: it does work in that PHP version, too. I suspect, like deceze, that you are jumping between encodings without knowing it. Make sure that you specify in your editor the encoding to UTF-8, and try to ensure every program in the mix to use UTF-8. – akirk Mar 12 '14 at 09:21
  • 1
    @deceze Concur; `Â¥` looks like UTF-8 viewed by something which is using Latin-1. Just make sure you have UTF-8 everywhere. – tripleee Mar 12 '14 at 09:21
  • 2
    The parentheses in the character class are superfluous. Here, they are harmless, but you should understand why they are wrong. – tripleee Mar 12 '14 at 09:22
  • @akirk : actually this character creates mess with iPhone , the system gets stuck if these characters are received by them. So I need to remove these characters . Tell me if I am going at wrong way .. – Priya Bhojani Mar 12 '14 at 09:25
  • @tripleee : sorry ? I could not get you.. Can you please elaborate ? – Priya Bhojani Mar 12 '14 at 09:26
  • Ensure that you send the Content-Type header with the encoding so that the device displaying the HTML knows that it is in UTF-8. Just start over and try to stick with UTF-8 everywhere on your path and you should be fine. – akirk Mar 12 '14 at 09:27
  • 1
    The regex `[(foo)]` is exactly equivalent to the regex `[()fo]`. That is, everything inside the square brackets is a literal, and there is no order (ranges excepted); duplicates are superfluous, but harmless. So `[( -~)]` is equivalent to `[() -~]` but since `()` are already included in the range, it's also equivalent to just `[ -~]`. – tripleee Mar 12 '14 at 09:43
0

First question, what do you want to achive? Regex can you make by yourself.

Like:

  • Uppercase (Regex = A-Z)
  • Lowercase (Regex = a-z)
  • Numbers (Regex = 0-9)
  • Special chars (Regex = /(slash), \s (space) etc)

To totalize this can be: A-Za-z0-9 However, regex has to many ways to implement, i have no idea about your case or what you want to achive, however, i hope you can do something with this post :)

eL-Prova
  • 1,084
  • 11
  • 27
  • I need to keep currency symbols and all ascii characters too. I am stuck in making this regex using this. Kindly help me. – Priya Bhojani Mar 12 '14 at 09:14
0

Try this

$output = preg_replace('/[^(A-Za-z\s\$¥€)]/u','',$output);

This will strip out anything other than A-Z, a-z, blank space and symbols $,¥,€

SajithNair
  • 3,867
  • 1
  • 17
  • 23