0

Here is my code:

import requests
from lxml.etree import HTML
title_req = requests.get("https://www.youtube.com/watch?v=VK3QWm7jvZs")
title_main = HTML(title_req.content)
title = title_main.xpath("//span[@id='eow-title']/@title")[0]
print(title)
>> Halsey - Without Me - Ù\x85ترجÙ\x85Ø© عربÙ\x8a

I want it to be like this:

>> Halsey - Without Me - مترجمة عربي

I tried to add UTF-8 encoding but its not working

Thanks.

xzoz
  • 18
  • 5
  • Possible duplicate of [How to print Unicode character in Python?](https://stackoverflow.com/questions/10569438/how-to-print-unicode-character-in-python) – Agi Hammerthief Feb 02 '19 at 08:40
  • no its not same problem, thx – xzoz Feb 02 '19 at 08:49
  • A question can be marked as duplicate if the *solution* is the same, even if the problem description is different. If one of the solutions on the other question solves the problem, the question is considered a duplicate. – Agi Hammerthief Feb 02 '19 at 08:52
  • i tried the solutions from the topic u say possible duplicate but it isn't working because the problem is from **lxml** not utf-8 – xzoz Feb 02 '19 at 08:56
  • what console do you use. cmd, bash, powershell? – Rahul Feb 02 '19 at 08:56
  • Try writing the file by `with open('urls.txt', encoding='utf-8') as f:f.write(title)` – Rahul Feb 02 '19 at 08:57
  • visual studio code but the problem apears when i save the string to file & when i print it – xzoz Feb 02 '19 at 08:57
  • visual studio code doesn't have console. They use the one with the system. I guess it's console problem. – Rahul Feb 02 '19 at 08:58
  • when i use your code is save this in the urls.txt: Halsey - Without Me - ÙØªØ±Ø¬ÙØ© عرب٠– xzoz Feb 02 '19 at 09:01
  • I checked it on linux and same problem. Looks like you have real issue here. – Rahul Feb 02 '19 at 09:01

1 Answers1

1

I don't know why but this line is creating problem.

title_main = HTML(title_req.content)

change it to

title_main = HTML(title_req.text)

I will try to know why.

Rahul
  • 10,830
  • 4
  • 53
  • 88
  • 1
    On Python 3.5.2 on Windows 10, I get the following error message: "UnicodeEncodeError: 'charmap' codec can't encode characters in position 22-27: character maps to ". What specific version of Python 3 are you using? – Agi Hammerthief Feb 02 '19 at 09:05
  • 2
    r.text is the content of the response in unicode, and r.content is the content of the response in bytes. – Denis Rasulev Feb 02 '19 at 09:11
  • python 3.6.3 64-bit visual studio code and its **WORKED** – xzoz Feb 02 '19 at 09:13