7

Duplicate of 249087

I have a bunch of user generated addresses that may contain characters with diacritic marks. What is the most effective (i.e. generic) way (apart from a straightforward replace) to automatically convert any such characters to their closest English equivalent?

E.g. any of àâãäå would become a

æ would become the two separate letters ae

ç would become c

any of èéêë would become e

etc. for all possible letter variations (preferably without having to find and encode lookups for each diacritic form of the letter).

(Note: I have to pass these addresses on to third party software that is incapable of printing anything other than English characters. I'd rather the software was capable of handling them, but I have no control over that.)

EDIT: Never mind... Found the answer [here][2]. It showed up in the "Related" section to the right of the question after I posted, but not in my prior search or as a pre-post suggestion. Hmm. I added the 'diacritics' tag to the other question in any case.

EDIT 2: Jeez! Who voted this -1 after I closed it?

Community
  • 1
  • 1
Andrew Rollings
  • 14,340
  • 7
  • 51
  • 50

1 Answers1

1

Just was going to post the same link :-)

Sounds like you're doing this already, but I would recommend that you store the original string for display in your application, and only do this for the 3rd-party stuff. People get cranky if they don't think their real name is important :-)

devstuff
  • 8,277
  • 1
  • 27
  • 33