I have a document with some special characters like non-breaking space, non-breaking hyphen, and so on. I want to normalize this document and replace these special characters with space. In addition since the content of this document is gathered from different resources, I have different forms of "Yeh" (ی) in it, and I want to normalize them.
Is it possible to find and replace unicode characters in a document using sed command? Can I use Unicode codes instead of surface form of the character? for example can I use x00a0 instead of non-breaking space in sed command? How?
Sorry for bad explanation. My documents are encoded in UTF8, and contain non-English characters. for example I have a document in Arabic, a document in Urdu, and one in Persian (Farsi). now I want to replace some of the characters in these files by another character. By normalizing, I mean that I want to replace all forms of "Yeh" into one form. (As you might now, there are many forms of this character which is used in Arabic, but for simplification and some processing issues I want to unify all these forms.