How to remove all junk characters but keep currency symbols by regular expression in php?

Question

Here is my code in php

<?php
$output = "Clean this copy of invalid non ASCII ¥ äócharacters.";
$output = preg_replace('/[^(\x20-\x7F\p{Sc})]/','',$output);    
echo($output);
?>

Here I want to keep any currency symbols as it is and need to remove junk characters .

What changes should I make in regex for this issue ?

Thanks in advance

What about the other hundreds of thousands of possible characters? Define clearly what you want to keep. — deceze, Mar 12 '14 at 09:05
I've added that only all ascii characters should be kept. Rather than this I need currency symbols to be kept. — Priya Bhojani, Mar 12 '14 at 09:09
It's better to keep what you want than remove what you don't need. Because always it's some new character that you want to remove. So get the list of wanted characters and replace all the others. — gbejic, Mar 12 '14 at 09:09
@gbejic : I want to keep currency symbols and all ascii characters.. Can you please help me to do that ? — Priya Bhojani, Mar 12 '14 at 09:10
@HarshIT : Thanks , But I need to keep all the currencies and all ascii characters too.. So it won't work — Priya Bhojani, Mar 12 '14 at 09:15

score 1 · Accepted Answer · answered Mar 12 '14 at 09:10

1

You need to add the u modifier to the RegEx and it works nicely:

$output = 'Clean this copy of invalid non ASCII ¥$€ äócharacters';
$output = preg_replace('/[^(\x20-\x7F\p{Sc})]/u','',$output);

Outputs

Clean this copy of invalid non ASCII ¥$€ characters.

answered Mar 12 '14 at 09:10

akirk

6,757
2
34
57

I got this output : Clean this copy of invalid non ASCII Â¥ characters. It contains the Â character too.. – Priya Bhojani Mar 12 '14 at 09:12
You need to be aware that the output will be UTF-8, too. Where do you look at the results? When through a browser you need to tell it that you are outputting UTF-8, for example by adding a `header("Content-type: text/html;charset=utf-8");` right at the start of the script. – akirk Mar 12 '14 at 09:14
@akirk : PHP Version 5.3.10-1ubuntu3.8 – Priya Bhojani Mar 12 '14 at 09:16
see how it works here: https://eval.in/118611 even without the charset: https://eval.in/118612 – akirk Mar 12 '14 at 09:17
@akirk : so won't it work with PHP Version 5.3.10-1ubuntu3.8 ? – Priya Bhojani Mar 12 '14 at 09:18
@Priya Perhaps you're just dealing with an encoding problem!? – deceze Mar 12 '14 at 09:19
@deceze : so how should I proceed further ? please help. – Priya Bhojani Mar 12 '14 at 09:21
PriyaBhojani: it does work in that PHP version, too. I suspect, like deceze, that you are jumping between encodings without knowing it. Make sure that you specify in your editor the encoding to UTF-8, and try to ensure every program in the mix to use UTF-8. – akirk Mar 12 '14 at 09:21
1

@deceze Concur; `Â¥` looks like UTF-8 viewed by something which is using Latin-1. Just make sure you have UTF-8 everywhere. – tripleee Mar 12 '14 at 09:21
2

The parentheses in the character class are superfluous. Here, they are harmless, but you should understand why they are wrong. – tripleee Mar 12 '14 at 09:22
@akirk : actually this character creates mess with iPhone , the system gets stuck if these characters are received by them. So I need to remove these characters . Tell me if I am going at wrong way .. – Priya Bhojani Mar 12 '14 at 09:25
@tripleee : sorry ? I could not get you.. Can you please elaborate ? – Priya Bhojani Mar 12 '14 at 09:26
Ensure that you send the Content-Type header with the encoding so that the device displaying the HTML knows that it is in UTF-8. Just start over and try to stick with UTF-8 everywhere on your path and you should be fine. – akirk Mar 12 '14 at 09:27
1

The regex `[(foo)]` is exactly equivalent to the regex `[()fo]`. That is, everything inside the square brackets is a literal, and there is no order (ranges excepted); duplicates are superfluous, but harmless. So `[( -~)]` is equivalent to `[() -~]` but since `()` are already included in the range, it's also equivalent to just `[ -~]`. – tripleee Mar 12 '14 at 09:43

score 0 · Answer 2 · answered Mar 12 '14 at 09:09

0

First question, what do you want to achive? Regex can you make by yourself.

Like:

Uppercase (Regex = A-Z)
Lowercase (Regex = a-z)
Numbers (Regex = 0-9)
Special chars (Regex = /(slash), \s (space) etc)

To totalize this can be: A-Za-z0-9 However, regex has to many ways to implement, i have no idea about your case or what you want to achive, however, i hope you can do something with this post :)

answered Mar 12 '14 at 09:09

eL-Prova

1,084
11
27

I need to keep currency symbols and all ascii characters too. I am stuck in making this regex using this. Kindly help me. – Priya Bhojani Mar 12 '14 at 09:14

score 0 · Answer 3 · answered Mar 12 '14 at 09:28

0

Try this

$output = preg_replace('/[^(A-Za-z\s\$¥€)]/u','',$output);

This will strip out anything other than A-Z, a-z, blank space and symbols $,¥,€

answered Mar 12 '14 at 09:28

SajithNair

3,867
1
17
23

But I need all the currencies to be allowed.. How would I do that ? – Priya Bhojani Mar 12 '14 at 09:37
You have to add that to the pattern – SajithNair Mar 12 '14 at 09:43

How to remove all junk characters but keep currency symbols by regular expression in php?

3 Answers3