1

I want to remove all HTML codes like " € á ... from a string using REGEX.

String: "This is a string " € á &"

Output Required: This is a string

Sukanta Paul
  • 784
  • 3
  • 19
  • 36
  • 5
    [How to remove html special chars?](http://stackoverflow.com/questions/657643/how-to-remove-html-special-chars). – s7anley May 30 '12 at 06:50
  • I am new to regex and want to create something which will index words from a webpage. – Sukanta Paul May 30 '12 at 06:52
  • 1
    Since you just want to get words, why not instead write a regex that finds all the words (ignoring anything with characters other than a-z and certain punctuation)? – Okonomiyaki3000 May 30 '12 at 06:57

5 Answers5

2

you can try

$str="This is a string " € á &";
$new_str = preg_replace("/&#?[a-z0-9]+;/i",'',$str);
echo $new_str;

i hope this may work

DESC:

& - starting with 
# - some HTML entities use the # sign 
?[a-z0-9] - followed by
;- ending with a semi-colon
i - case insensitive. 
Zuber Surya
  • 839
  • 7
  • 17
0
preg_replace('#&[^;]+;#', '', "This is a string " € á &");
j0k
  • 22,600
  • 28
  • 79
  • 90
s.webbandit
  • 16,332
  • 16
  • 58
  • 82
0

Try this:

preg_replace('/[^\w\d\s]*/', '', htmlspecialchars_decode($string));

Although it might remove some things you don't want removed. You may need to modify the regex.

Okonomiyaki3000
  • 3,628
  • 23
  • 23
0
$str = preg_replace_callback('/&[^; ]+;/', function($matches){
    return html_entity_decode($matches[0], ENT_QUOTES) == $matches[0] ? $matches[0] : '';
}, $str);

This will work, but won't strip € since that is not an entity in HTML 4. If you have PHP 5.4 you can use the flags ENT_QUOTES | ENT_HTML5 to have it work correctly with HTML5 entities like €.

Paul
  • 139,544
  • 27
  • 275
  • 264
0

If you're trying to totally remove entities (ie: not decoding them) then try this:

$string = 'This is a string " € á &';

$pattern = '/&([#0-9A-Za-z]+);/';
echo preg_replace($pattern, '', $string);
ulentini
  • 2,413
  • 1
  • 14
  • 26