How to convert MS dot character to Unicode

Asked Apr 08 '19 at 11:44

Active Apr 08 '19 at 11:44

Viewed 92 times

I'm working with text that has been produced in MS Word and pasted into a form textbox. It contains dot characters introducing list items that are coded as \x95. I want to convert these to Unicode u2022 using preg_replace.

I can't persuade that function to recognise the character, either as a literal or as unicode u0095 in lines like

preg_replace("/•/","\u2022", $text);
preg_replace("/\u0095/u","\u2022", $text);

What is the correct way to do this?

asked Apr 08 '19 at 11:44

magnol

Your `•` char is actually a `\u2022` char. What exactly are you doing? – Wiktor Stribiżew Apr 08 '19 at 11:50
@WiktorStribiżew: Interesting. When I grabbed the original text and put it through the Hex converter in UltraEdit, it came out as \x65, with no characters around it that suggest it might be UTF-8 or Unicode. If I do the same with the code example above, it appears as literal '\u2022'. It seems that there's been a character conversion between my posting the character and it appearing here. How do I reproduce this? – magnol Apr 08 '19 at 12:30
What if you use `str_replace("\x95", "\u{2022}", $text)`? – Wiktor Stribiżew Apr 08 '19 at 12:53
Nothing happens. I wonder if the text to hex conversion in UltraEdit is actually showing me only one element of the UTF-8 character string. But the text view displays the bullet. – magnol Apr 09 '19 at 08:17

How to convert MS dot character to Unicode

0 Answers0