2

It seems I have another problem with special character and double quotes and so on after this question that has been solved earlier.

I used to use this function that convert symbol like '&' to numberic code for XML,

function convert_specialchars_to_xmlenties($string) 
{ 

    # in order to convert  <, >, &, ' and ", include them into the square brackes [<\'"&>\x80-\xff]
    $output = preg_replace('/([<\'"&>\x80-\xff])/e', "'&#' . ord('$1') . ';'", $string);

    # return the result
    return $output; 
}

So if my input is Judge-Fürstová Mila & Judge-Fürstová Mila

I will get Judge-F&#252;rstov&#225; Mila &#38; Judge-F&#252;rstov&#225; Mila

But I think since I am using PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8" to handle all my special characters, so if my input is something like

Judge-Fürstová Mila & Judge-Fürstová Mila

now will returns,

Judge-F&#195;&#188;rstov&#195;&#161; Mila &#38; Judge-F&#195;&#188;rstov&#195;&#161; Mila

Which is incorrect for XML I think.

So I think I should just convert <, >, &, ' and " only but not other special characters like ü or á

Any ideas I how I can do this? Or maybe I have thought/ understood the problem incorrectly and there are other better ways to solve this problem?

EDIT:

I was wrong, as I just changed the function which only converts <, >, &, ' and "

$output = preg_replace('/([<\'"&>])/e', "'&#' . ord('$1') . ';'", $string);

XML still does not accept the converted code below,

Judge-Fürstová Mila &#38; Judge-Fürstová Mila

I cannot think of any other reason why it does that! Any ideas?

Community
  • 1
  • 1
Run
  • 54,938
  • 169
  • 450
  • 748

2 Answers2

3

You want htmlspecialchars(). Don't let the name throw you off. It by default converts only the characters you've listed.

Marc B
  • 356,200
  • 43
  • 426
  • 500
-1

Edited answer to cut out all the superfluous stuff and just keep the actual answer

You want mb_ereg_replace_callback, and the callback should handle multibyte characters. Something like:

$out = mb_ereg_replace_callback(
    "[<>&\"']",
    function($a) {
        $o = 0;
        $l = strlen($a[0]);
        for( $i=0; $i<$l; $i++) {
            $o = ($o << 8) | ord($a[$i]);
        }
        return "&#".$o.";";
    },
    $in);

Although in this case the callback would be fine with just a simple ord, you might want to reuse this code for other characters sometime.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • thanks for the answer, I just tried it but it is still not getting any better, for instance `&` will get `<x>&</x>` but the converted code still cannot be processed as XML... – Run Apr 19 '12 at 18:49
  • I removed the useless stuff. Try it now. – Niet the Dark Absol Apr 19 '12 at 18:53
  • Thanks Kolink, I get this error though `Fatal error: Call to undefined function mb_ereg_replace_callback() in...` – Run Apr 19 '12 at 18:57
  • Ah, you don't have the mbstring module? Hmm... That kinda makes it hard to work with UTF-8... – Niet the Dark Absol Apr 19 '12 at 19:12
  • I have php_mbstring turned on in warmpserver but I dunno why that function is undefined... maybe i'm on php 5.3.10...? – Run Apr 19 '12 at 19:20