I'd like to write a regular expression which will match all accented forms of a particular character in text encoded using some Unicode encoding, without explicitly listing out all such forms in a character class.
So, for example, if I'd like to match any accented version of a
, [aàáâãäå]
is insufficient, as it gets only the a
's which live in ISO-8859-1, and there may well be other accents which don't occur there. Something which would be acceptable is something like \p{Base_Character: a}
, were there such a thing defined in Unicode. Does something which does this exist?
Edit: I can't ASCIIfy the string first---the string is in a database I don't have direct access to. I don't have code-level access to anything here, in fact. The only input I can give is a regex.