1

I need use utf-8 characters in set dryscrape method. But after run show this error:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

My code (for example):

site = dryscrape.Session()
site.visit("https://www.website.com")
search = site.at_xpath('//*[@name="search"]')
search.set(u'فارسی')
search.form().submit()

Also u'فارسی' change to search.set(unicode('فارسی', 'utf-8')), But show this error.

mySun
  • 1,550
  • 5
  • 32
  • 52
  • Did you try `search.set(u'فارسی'.encode('utf-8'))`? Seriously, you should be using Python 3. It has much better Unicode handling. In the mean time, take a look at [Pragmatic Unicode](http://nedbatchelder.com/text/unipain.html), which was written by SO veteran Ned Batchelder. – PM 2Ring Dec 20 '17 at 07:31
  • First : `How do you know the response is Unicode?`. Second : `Your application code cannot be UTF-8`(will not provide equality read with your typing.) – dsgdfg Dec 20 '17 at 07:48
  • @PM2Ring , Hi, I try `search.set(u'فارسی'.encode('utf-8'))` but show this error. I use python 2.7. :( – mySun Dec 20 '17 at 07:49
  • @dsgdfg , Hi, I add `# coding=utf-8` in first line. Sorry i do not know much english. Please explain more... – mySun Dec 20 '17 at 07:52
  • `search.set("فارسی")` .Your encoding type is iso8859-6(but default is `ascii`). `import locale; print locale.getdefaultlocale()` If the output does not have UTF-8, change the default local encoding shape at the beginning of your application. If you already have local encoding UTF-8 you do not need to write anything to the beginning of the Python application! – dsgdfg Dec 20 '17 at 08:11
  • @dsgdfg , Thank you, I run `import locale; print locale.getdefaultlocale()` and show this log: `('en_US', 'UTF-8')`. also use `"فارسی"` in my code bus show this error. :( – mySun Dec 20 '17 at 08:15
  • Basic test : `print u'فارسی' == unicode('فارسی',"utf-8")` output is **True** you got `UTF-8` encoding or **False** is `ASCII` encoding(but need type in your script). – dsgdfg Dec 20 '17 at 08:50
  • if you have `ASCII` encodings, check my first comment ! `search.set('\xd9\x81\xd8\xa7\xd8\xb1\xd8\xb3\xdb\x8c'.decode("utf-8"))` will be work. Because **If UTF-8 encoding does not exist, the entire input type must be a byte value.** @mySun – dsgdfg Dec 20 '17 at 08:58
  • @dsgdfg , Thank you for your guidance. ‍‍‍‍`print u'فارسی' == unicode('فارسی',"utf-8")` is True, I use `search.set('\xd9\x81\xd8\xa7\xd8\xb1\xd8\xb3\xdb\x8c'.decode‌​("utf-8"))` but show this error :( – mySun Dec 20 '17 at 09:19
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/161604/discussion-between-dsgdfg-and-mysun). – dsgdfg Dec 20 '17 at 09:22
  • The Web page's response cannot be displayed correctly. Read the guide pages of the Python library. The dependencies of the module you are using are changing some work settings. **This encoding error is not caused by your code.** – dsgdfg Dec 20 '17 at 09:39

1 Answers1

0

Its very easy... This method working perfectly with google. Also try with any other if you know the url prams

import dryscrape as d
d.start_xvfb()
br = d.Session()
import urllib.parse
query = urllib.parse.quote("فارسی")
print(query)  #it prints : '%D9%81%D8%A7%D8%B1%D8%B3%DB%8C'
Url = "http://google.com/search?q="+query
br.visit(Url)
print(br.xpath('//title')[0].text())
#it prints : Google Search - فارسی
#You can also check it with br.render("url_screenshot.png")