1

This is an app that gives a term and crawls Urban Dictionary and returns the first meaning located in the page. this is my code so far:

import re
import urllib.request

term = input('Enter a word: ')
url = "https://www.urbandictionary.com/define.php?term=" + term

rawData = urllib.request.urlopen(url).read()
decodedData = rawData.decode("utf-8")

x = re.search('div class="meaning"', rawData)
start = x.start()
end = x.end()
result = rawData[start:end]
print(result)

but I get the error below

    Traceback (most recent call last):
  File "<pyshell#8>", line 1, in <module>
    print(decodedData)
  File "~\Python\Python35-32\lib\idlelib\PyShell.py", line 1344, in write
    return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 95889-95889: Non-BMP character not supported in Tk

how can I exclude the characters that can't be decoded?

Nitwit
  • 291
  • 3
  • 13

1 Answers1

1

OK, to solve your problem, you just have to actually use your decoded data. Currently your are decoding your data, but then you use the rawData:

import re
import urllib.request

term = input('Enter a word: ')
url = "https://www.urbandictionary.com/define.php?term=" + term

rawData = urllib.request.urlopen(url).read()
decodedData = rawData.decode("utf-8")

x = re.search('div class="meaning"', decodedData)
start = x.start()
end = x.end()
result = decodedData[start:end]
print(result)

That should do it. If this doesn't work, please post an example word that throws this error. (This code will not produce the output you want by the way)

MegaIng
  • 7,361
  • 1
  • 22
  • 35