Python automatically reads German umlauts and punctuation as
Gefrier- und Tiefkühlmöbel
How do I normalize this output to remove punctuation?
Python automatically reads German umlauts and punctuation as
Gefrier- und Tiefkühlmöbel
How do I normalize this output to remove punctuation?
You could "fix" the encoding issue by doing:
the_string = 'Gefrier- und Tiefkühlmöbel'.encode('latin-1').decode('utf-8')
And then apply a solution like this one: https://stackoverflow.com/a/518232/2452074
import unicodedata
def strip_accents(s):
return ''.join(c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn')
strip_accents(the_string)
> 'Gefrier- und Tiefkuhlmobel'
But first, I would try to understand why your input looks broken, Python itself shouldn't do that automatically.
Some background docs on unicode and encodings: https://docs.python.org/3/howto/unicode.html