I'm using python to extract contents of one web page. The html content that I focus on has some Chinese characters inside, together with other usual characters.
Then, I tried to print the html tag and its content, the printed texts are all messy code. Like below shows:
<h4>绔彛:443</h4>
<h4>A瀵嗙爜:</h4>
<h4>鍔犲瘑鏂瑰紡:aes-256-cfb</h4>
The original content are as follows:
<h4>端口:443</h4>
<h4>A远端:</h4>
<h4>加密方式:aes-256-cfb</h4>
Could you please help me how to print out the correct content in the console?
I'm using python 2.7. The code snippet is as shown below:
Adding one update:
After I tried Shiva's proposal, using the lxml way, I got the result shown as below capture:
Could you please tell me how to display original Chinese characters in Git bash console?
Thank you in advance!
Best regards,
Junma