There's been quite some help around this already, but I am still confused.
I have a unicode string like this:
title = u'test'
title_length = len(title) #5
But! I need len(title) to be 6. The clients expect it to be 6 because they seem to count in a different way than I do on the backend.
As a workaround I have written this little helper, but I am sure it can be improved (with enough knowledge about encodings) or perhaps it's even wrong.
title_length = len(title) + repr(title).count('\\U') #6
1. Is there a better way of getting the length to be 6? :-)
I assume me (Python) is counting the number of unicode characters which is 5. The clients are counting the number of bytes?
2. Would my logic break for other unicode characters that need 4 bytes for example?
Running Python 2.7 ucs4.