1

I am fetching latest football scores from a website and sending a notification on the desktop (OS X). I am using BeautifulSoup to scrape the data. I had issues with the unicode data which was generating this error

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128). 

So I inserted this at the beginning which solved the problem while outputting on the terminal.

import sys 
reload(sys)
sys.setdefaultencoding('utf-8') 

But the problem exists when I am sending notifications on the desktop. I use terminal-notifier to send desktop-notifications.

def notify (title, subtitle, message):
    t = '-title {!r}'.format(title)
    s = '-subtitle {!r}'.format(subtitle)
    m = '-message {!r}'.format(message)
    os.system('terminal-notifier {}'.format(' '.join((m, t, s))))

The below images depict the output on the terminal Vs the desktop notification.

Output on terminal.

enter image description here

Desktop Notification

Dektop Notification

Also, if I try to replace the comma in the string, I get the error,

new_scorer = str(new_scorer[0].text).replace(",","")

File "live_football_bbc01.py", line 41, in get_score
    new_scorer = str(new_scorer[0].text).replace(",","")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)

How do I get the output on the desktop notifications like the one on the terminal? Thanks!

Edit : Snapshot of the desktop notification. (Solved)

enter image description here

sagar_jeevan
  • 761
  • 5
  • 16
  • 34
  • Don't call str, encode/decode as needed. Why are you calling str anyway? Also your reload logic is a terrible idea. http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script. Another example of why answers like this http://stackoverflow.com/a/31137935/2141635 should be deleted – Padraic Cunningham Sep 05 '16 at 11:35
  • Then I get proper output on terminal but on desktop notification the string beings with 'u', like` 'ukrkic' and 'uAg\xc3\b...` – sagar_jeevan Sep 05 '16 at 11:38
  • Are you using python 2? If you are and don't need to be, this is one space where Python 3's default behaviour is much more convenient. – Paul Sep 05 '16 at 14:39
  • Oh yeah? I am using Python 2.7. Will look into that. Maybe its high time to switch to Python 3. – sagar_jeevan Sep 05 '16 at 14:48

2 Answers2

1

You are formatting using !r which gives you the repr output, forget the terrible reload logic and either use unicode everywhere:

def notify (title, subtitle, message):
    t = u'-title {}'.format(title)
    s = u'-subtitle {}'.format(subtitle)
    m = u'-message {}'.format(message)
    os.system(u'terminal-notifier {}'.format(u' '.join((m, t, s))))

or encode:

def notify (title, subtitle, message):
    t = '-title {}'.format(title.encode("utf-8"))
    s = '-subtitle {}'.format(subtitle.encode("utf-8"))
    m = '-message {}'.format(message.encode("utf-8"))
    os.system('terminal-notifier {}'.format(' '.join((m, t, s))))

When you call str(new_scorer[0].text).replace(",","") you are trying to encode to ascii, you need to specify the encoding to use:

In [13]: s1=s2=s3= u'\xfc'

In [14]: str(s1) # tries to encode to ascii
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-14-589849bdf059> in <module>()
----> 1 str(s1)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)

In [15]: "{}".format(s1) + "{}".format(s2) + "{}".format(s3) # tries to encode to ascii---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-15-7ca3746f9fba> in <module>()
----> 1 "{}".format(s1) + "{}".format(s2) + "{}".format(s3)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)

You can encode straight away:

In [16]: "{}".format(s1.encode("utf-8")) + "{}".format(s2.encode("utf-8")) + "{}".format(s3.encode("utf-8"))
Out[16]: '\xc3\xbc\xc3\xbc\xc3\xbc'

Or use use all unicode prepending a u to the format strings and encoding last:

In [17]: out = u"{}".format(s1) + u"{}".format(s2) + u"{}".format(s3)
In [18]: out
Out[18]: u'\xfc\xfc\xfc'

In [19]: out.encode("utf-8")
Out[19]: '\xc3\xbc\xc3\xbc\xc3\xbc'

If you use !r you are always going to the the bytes in the output:

In [30]: print "{}".format(s1.encode("utf-8"))
ü

In [31]: print "{!r}".format(s1).encode("utf-8")
u'\xfc'

You can also pass the args using subprocess:

from subprocess import check_call


def notify (title, subtitle, message):
    cheek_call(['terminal-notifier','-title',title.encode("utf-8"),
                '-subtitle',subtitle.encode("utf-8"),
                '-message'.message.encode("utf-8")])
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • I am using repr to get the quotes for the string so that it matches the terminal-notifier syntax `terminal-notifier -[message|list|remove] [VALUE|ID|ID] [options]`. Otherwise I would get the error, `sh: -c: line 0: syntax error near unexpected token `(' sh: -c: line 0: `terminal-notifier -message Krkic (49' minutes pen) -title Stoke Vs Man City -subtitle 1-4 FT'` – sagar_jeevan Sep 05 '16 at 12:13
  • Using the encoding version, I still get the same output on the desktop notifier. – sagar_jeevan Sep 05 '16 at 12:16
  • Well use subprocess not os.system and pass a lsit of args to subprocess.check_call – Padraic Cunningham Sep 05 '16 at 12:18
  • Everything is working fine except for the desktop notification. I suspect if the terminal-notifier has inbuilt encoding/decoding issues. – sagar_jeevan Sep 05 '16 at 12:42
  • @build_code, in your question what you see are the utf-8 encoded bytes from the repr output, if you printed a tuple with those elements you would see the same output so i don't think it has anything to do with encoding as the encoded bytes are correct, somehow the repr output is getting outputted from the ruby program – Padraic Cunningham Sep 05 '16 at 13:04
  • Yes the `repr` function is modifying the string. I tried with concatenating with `" ' "+'-message { }'.format(repr(message.encode("utf-8")))+ " ' "` to get the quoted string but didn't seem to work, shows syntax error. Any other alternative to make the terminal-notifier execute or alternatives for desktop notifier ? – sagar_jeevan Sep 05 '16 at 13:32
  • Finally got it !! used `m = '-message'+ " '"+message.encode("utf-8")+ "'"` and worked. I am able to get the desktop notification like the one on the terminal. – sagar_jeevan Sep 05 '16 at 13:43
  • Thanks a ton for all the explanation and details and for spending your time. Helped !! – sagar_jeevan Sep 05 '16 at 13:44
  • Didn't use the subprocess code because the issue was with repr as you pointed out. Os.system wasn't causing any problems. – sagar_jeevan Sep 05 '16 at 13:46
  • No worries. I was just going to say split the title and encoded string into separate arts and the subprocess code will work fine. I prefer subprocess as it is less error prone and simpler to pass args. – Padraic Cunningham Sep 05 '16 at 13:46
  • 1
    For reference, I have edited the question that includes the desktop notification that worked. – sagar_jeevan Sep 05 '16 at 13:54
-1

Use: ˋsys.getfilesystemencoding` to get your encoding

Encode your string with it, ignore or replace errors:

import sys

encoding = sys.getfilesystemencoding()
msg = new_scorer[0].text.replace(",", "")
print(msg.encode(encoding, errons="replace"))
Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103