Why are the Germany Umlaut letters not printed right in Python?

Question

I want to print some strings scraped by BeautifulSoup and they are not printed right. For "ü" I get "Ã¼" and so on.

Here is my code

from bs4 import BeautifulSoup
import requests
import re

with open('index.html') as html_file:

for link in soup.find_all('a'):
    print(link.get('href'))

EDIT

Found the solution myself. You have to open the file with the right encoding as follows:

open('index.html', encoding='utf8')

That is a syntax error, which really has nothing to do with what you are asking. I suspect you are using Python 3, which you should, in which case, you need to use `print` as a function, not a statement. the good news is that Python 3 makes all of this much simpler — juanpa.arrivillaga, Oct 02 '19 at 12:57
How does that address the question of the error about the coding? I also tried to put it in parentheses. That does not solve it. — neolith, Oct 02 '19 at 13:02
That answers the question about the **error you are showing**. If you are getting **another error** then edit the question and provide a [mcve]. — juanpa.arrivillaga, Oct 02 '19 at 13:05
that looks like an error related to your *source-file encoding*. you've said that it is `iso-8859-1` using `# -*- coding:` but likely it is not. Have you tried using utf-16? Also, **please provide a [mcve]**, not your whole code, but a snippet which actually reproduces your problem, which is likely something that can be one or two lines — juanpa.arrivillaga, Oct 02 '19 at 13:09
Just do `print(message)`, whatever encoding this is, it is already a string, no decoding necessary. — L3viathan, Oct 02 '19 at 13:09
... why are you doing `print (message.decode("iso-8859-1").encode(stdout_encoding))` ??? It doesn't make sense to `.decode` a string. — juanpa.arrivillaga, Oct 02 '19 at 13:10
@juanpa.arrivillaga because they copied some code from somewhere old that was assuming Python 2. — L3viathan, Oct 02 '19 at 13:10
I am doing ```print (message.decode("iso-8859-1").encode(stdout_encoding))```, since that is the approach in the tutorial. If I just use print(message), it doesn't print me the ä, ö and ü. I have hundreds of lines and would have to edit them by hand. — neolith, Oct 02 '19 at 13:12
@neolith don't just blindly follow a tutorial, especially since it is clearly for Python 2 not Python 3, which handle strings fundamentally incompatibly. What happens when you `print(message)` **exactly**? Are you sure the problem isn't simply that the terminal you are using doesn't support the encoding you are trying to use? Perhaps get a better terminal, or look for instructions on how to change your terminal settings. — juanpa.arrivillaga, Oct 02 '19 at 13:14
Your tutorial is 13 years old, things have changed since then. — L3viathan, Oct 02 '19 at 13:15
Yes, you are right. I haven't looked at the date that closely. How would you approach it for Spanish and ñ? — neolith, Oct 02 '19 at 13:19
@neolith the *language* has no bearing. Again, what exactly is the behavior you see when you simply `print(message)`? — juanpa.arrivillaga, Oct 02 '19 at 13:21
Words like "Öl" are printed as "Ã–l" and "prüfen" becomes "prÃ¼fen" — neolith, Oct 02 '19 at 13:23
What happens if you remove the "encoding cookie"? If you run a script containing as the only line (!) `print("Öl")`? — L3viathan, Oct 02 '19 at 13:23
Found the solution. I had to open it with the right encoding as follows: open('index.html', encoding='utf8') — neolith, Oct 02 '19 at 13:24
Yes, that is the problem with the minimal examples. I am a noob and don't know which lines might be relevant for such an example. I changed the question accordingly and provided the solution. Thank you a lot for your help tronco! — neolith, Oct 02 '19 at 13:31

Why are the Germany Umlaut letters not printed right in Python?

0 Answers0