1

guys!

I'm trying to parse this URL http://mapia.ua/ru/search?&city=%D0%9D%D0%B8%D0%BA%D0%BE%D0%BB%D0%B0%D0%B5%D0%B2&page=1&what=%D0%BE%D0%BE%D0%BE using BeautifulSoup.

But I have got a strange characters like this ��� �1 ��� "����"

Here is my code

from bs4 import BeautifulSoup
import urllib.request

URL = urllib.request.urlopen('http://mapia.ua/ru/search?city=%D0%9D%D0%B8%D0%BA%D0%BE%D0%BB%D0%B0%D0%B5%D0%B2&what=%D0%BE%D0%BE%D0%BE&page=1').read()

soup = BeautifulSoup(URL, 'html.parser')

print(soup.h3.get_text())

Can anybody help me?

P.S. I'm using python 3

andrii1986
  • 33
  • 5
  • The issue is the with the shell you are using to output the data, I get `ЖЭК №1 ООО "Дуэт"` as my default encoding is utf-8, your accepted answer actually causes it to not work. – Padraic Cunningham Jun 02 '16 at 19:33

1 Answers1

-1

I found this :

import urllib.request
with urllib.request.urlopen('http://python.org/') as response:
   html = response.read()
soup = BeautifulSoup(html.decode('utf-8', 'ignore').encode("utf-8"))

From:

How to correctly parse UTF-8 encoded HTML to Unicode strings with BeautifulSoup?

Also:

Delete every non utf-8 symbols froms string

Hope it helps ;)

Community
  • 1
  • 1
Destrif
  • 2,104
  • 1
  • 14
  • 22