I write a little script to use collins website for translation. heres my code:
import urllib.request
import re
def translate(search):
base_url = 'http://www.collinsdictionary.com/dictionary/american/'
url = base_url + search
p = urllib.request.urlopen(url).read()
f = open('t.txt', 'w+b')
f.write(p)
f.close()
f = open('t.txt', 'r')
t = f.read()
m = re.search(r'(<span class="def">)(\w.*)(</span>]*)',t)
n = m.group(2)
print(n)
f.close()
I have some questions:
I can't use re.search on p. it raises this error:
TypeError: can't use a string pattern on a bytes-like object
is there a way to usere.search
without saving it?After saving file I should reopen it to use re.search otherwise it raises this error:
TypeError: must be str, not bytes
why this error happens?in this program I want to extract information between
<span class="def"> and </span>
from first match. but pattern that I wrote not work good in all cases. for example translate('three') is good. out put is : "totaling one more than two" but for translate('tree') out put is: "a treelike bush or shrub ⇒ a rose tree" is there a way to correct this pattern. regular expression or any other tools?