There isn't a standard library function to do the job. There must be a large number of implementation available in the Open Source world - just about any program that has to deal with HTML will have one.
There are two aspects to the problem:
- Finding the HTML entities in the source string.
- Inserting the appropriate replacement text in its place.
Since the shortest possible entity is '&x;' (but, AFAIK, they all use at least 2 characters between the ampersand and the semi-colon), you will always be shortening the string since the longest possible UTF-8 character representation is 4 bytes. Hence, it is possible to edit in situ safely.
There's an illustration of HTML entity decoding in 'The Practice of Programming' by Kernighan and Pike, though it is done somewhat 'in passing'. They use a tokenizer to recognize the entity, and a sorted table of entity names plus the replacement value so that they can use a binary search to identify the replacements. This is only needed for the non-algorithmic entity names. For entities encoded as 'ß', you use an algorithmic technique to decode them.