0

Python automatically reads German umlauts and punctuation as

Gefrier- und Tiefkühlmöbel

How do I normalize this output to remove punctuation?

SSurfer
  • 21
  • 1
  • 5
  • What are you using to read ? is it `input(...)` ? You seem to have an encoding issue probably mixing `latin-1` with `utf-8`. Are you using windows ? – JoseKilo Oct 16 '20 at 12:00

1 Answers1

0

You could "fix" the encoding issue by doing:

the_string = 'Gefrier- und Tiefkühlmöbel'.encode('latin-1').decode('utf-8')

And then apply a solution like this one: https://stackoverflow.com/a/518232/2452074

import unicodedata
def strip_accents(s):
   return ''.join(c for c in unicodedata.normalize('NFD', s)
                  if unicodedata.category(c) != 'Mn')

strip_accents(the_string)
> 'Gefrier- und Tiefkuhlmobel'

But first, I would try to understand why your input looks broken, Python itself shouldn't do that automatically.

Some background docs on unicode and encodings: https://docs.python.org/3/howto/unicode.html

JoseKilo
  • 2,343
  • 1
  • 16
  • 28