The constructor of the SimepleXMLElement needs it's first parameter to be well-formed XML.
The string you pass
$str = "<aoc> САМОЛЕТОМ\x02\x01 ТК Адамант, г.Домодедово, мкр-н Востряково, Центральный просп. д.12</aoc>";
is not well-formed XML because it contains characters out of the character-range of XML, namely:
- Unicode Character 'START OF TEXT' (U+0002) at binary offset 24
- Unicode Character 'START OF HEADING' (U+0001) at binary offset 25
So instead of using SimpleXMLElement to create it from a hand-mangled XML-string (which is error-prone), use it to create the XML you're looking for. Let's give an example.
In the following example I assume you've got the text you want to create the XML element of. This example creates an XML element similar to the one in your question with the difference that the exact same string is passed in as text-content for the document element ("<aoc>
").
$text = 'САМОЛЕТОМ ТК Адамант, г.Домодедово, мкр-н Востряково, Центральный просп. д.12';
$xml = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8"?><aoc/>');
$xml->{0} = $text; // set the document-element's text-content to $text
When done this way, SimpleXML will filter any invalid control-characters for you and the SimpleXMLElement remains stable:
$str = $xml->asXML();
$movies = new SimpleXMLElement($str);
print_r($movies);
/* output:
SimpleXMLElement Object
(
[0] => САМОЛЕТОМ ТК Адамант, г.Домодедово, мкр-н Востряково, Центральный просп. д.12
)
*/
So to finally answer your question:
How can I remove Unicode from string?
You don't want to remove Unicode from the string. The SimpleXML library accepts Unicode strings only (in the UTF-8 encoding). What you want is that you remove Unicode-characters that are invalid for XML usage. The SimpleXML library does that for you when you set node-values as it has been designed for.
However if you try to load non-well-formed XML via the contructor or the constructor functions (simplexml_load_string
etc.), it will fail and give you the (important) error.
I hope this clarifies the situation for you and answers your question.