There's a bit of confusion about terminology in your question. A diacritic is a mark that can be added to a letter or other character but generally does not stand on its own. (Unicode also uses the more general term combining character.) What normalize('NFD', ...)
does is to convert precomposed characters into their components.
Anyway, the answer is that œ is not a precomposed character. It's a typographic ligature:
>>> unicodedata.name(u'\u0153')
'LATIN SMALL LIGATURE OE'
The unicodedata
module provides no method for splitting ligatures into their parts. But the data is there in the character names:
import re
import unicodedata
_ligature_re = re.compile(r'LATIN (?:(CAPITAL)|SMALL) LIGATURE ([A-Z]{2,})')
def split_ligatures(s):
"""
Split the ligatures in `s` into their component letters.
"""
def untie(l):
m = _ligature_re.match(unicodedata.name(l))
if not m: return l
elif m.group(1): return m.group(2)
else: return m.group(2).lower()
return ''.join(untie(l) for l in s)
>>> split_ligatures(u'B\u0153uf \u0132sselmeer \uFB00otogra\uFB00')
u'Boeuf IJsselmeer ffotograff'
(Of course you wouldn't do it like this in practice: you'd preprocess the Unicode database to generate a lookup table as you suggest in your question. There aren't all that many ligatures in Unicode.)