Using WordNet with PyScript

Question

I'm trying to use WordNet within PyScript but I can't seem to properly load Wordnet.

At first I tried:

<py-env>
   - nltk
</py-env>
<py-script>
   import nltk
   from nltk.corpus import wordnet as wn
<py-script>

This gave me a LookupError(resource_not_found), along with the message

Please use the NLTK Downloader to obtain the resource: [31m>>> import nltk >>> nltk.download('wordnet')

I then tried:

<py-script>
   import nltk
   nltk.download('wordnet')
   from nltk.corpus import wordnet as wn
<py-script>

which gave me this message in the console:

writing to py-3f0adca1-a38a-4161-c36f-7e6548260aa5 [nltk_data] Error loading wordnet: <urlopen error unknown url type: [nltk_data] https> true

I looked at the responses here: Pyodide filesystem for NLTK resources : missing files and tried to replicate their code

    from js import fetch
    from pathlib import Path
    import asyncio, os, sys, io, zipfile
    
    response = await fetch('https://github.com/nltk/wordnet/archive/refs/heads/master.zip')
    js_buffer = await response.arrayBuffer()
    py_buffer = js_buffer.to_py()  # this is a memoryview
    stream = py_buffer.tobytes()  # now we have a bytes object

    d = Path("/nltk/wordnet")
    d.mkdir(parents=True, exist_ok=True)

    Path('/nltk/wordnet/master.zip').write_bytes(stream)

    zipfile.ZipFile('/nltk/wordnet/master.zip').extractall(
        path='/nltk/wordnet/'
    )

This is the error message that I got:

APPENDING: True ==> py-2880055f-8922-cb23-34e4-db404fb1d7a4 --> PythonError: Traceback (most recent call last):

File "/lib/python3.10/asyncio/futures.py", line 201, in result raise self._exception

File "/lib/python3.10/asyncio/tasks.py", line 232, in __step result = coro.send(None)

File "/lib/python3.10/site-packages/_pyodide/_base.py", line 500, in eval_code_async await CodeRunner(

File "/lib/python3.10/site-packages/_pyodide/_base.py", line 353, in run_async await coroutine

File "<exec>", line 21, in

File "/lib/python3.10/zipfile.py", line 1258, in init self._RealGetContents()

File "/lib/python3.10/zipfile.py", line 1325, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file

What am I doing wrong? Thanks!

UPDATE:

I tried installing the wn library from PyPi using

await micropip.install('https://files.pythonhosted.org/packages/ce/f1/53b07100f5c3d41fd33fc78ebb9e99d736b0460ced8acff94840311ffc60/wn-0.9.1-py3-none-any.whl')

But I get the error:

JsException(PythonError: Traceback (most recent call last): File "/lib/python3.10/asyncio/futures.py", line 201, in result raise self._exception File "/lib/python3.10/asyncio/tasks.py", line 232, in __step result = coro.send(None) File "/lib/python3.10/site-packages/_pyodide/_base.py", line 500, in eval_code_async await CodeRunner( File "/lib/python3.10/site-packages/_pyodide/_base.py", line 353, in run_async await coroutine File "", line 14, in File "/lib/python3.10/site-packages/wn/init.py", line 47, in from wn._add import add, remove File "/lib/python3.10/site-packages/wn/_add.py", line 21, in from wn.project import iterpackages File "/lib/python3.10/site-packages/wn/project.py", line 12, in import lzma File "/lib/python3.10/lzma.py", line 27, in from _lzma import * ModuleNotFoundError: No module named '_lzma' )

1) You should add **import asyncio** since the library is using async. 2) Also, note that your code is writing/reading files to/from the browser virtual file system that Pyodide provides and not the desktop file system. 3) Double-check that your **fetch** actually succeeded. 4) Verify the data manipulation results in data with a zip header (the first four bytes contain **PK** (0xx04034b50). — John Hanley, Jul 28 '22 at 20:24
After your update. The GitHub URL will redirect. Your fetch code must follow HTTP redirects. Double check the return values from fetch and the response data. — John Hanley, Jul 29 '22 at 02:28
1. I included it in my code but forgot to include it in my post. Edited post. 2. Thanks for the heads-up! 3. I see that the fetch attempt returns an error message "Access to fetch at 'https://github.com/nltk/wordnet/archive/refs/heads/master.zip' from origin 'null' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled." — AbsoluteBeginner, Jul 29 '22 at 04:01
From your second comment: I tried using the line "response = await fetch(, {redirect : follow})" but get the error "NameError: name 'redirect' is not defined". Thanks so much for the help btw! — AbsoluteBeginner, Jul 29 '22 at 04:11
FYI: GitHub does not support browser cross-site origin requests. That is why the browser is not following the redirect. If your code checked for the returned status code, you would have detected that. — John Hanley, Jul 29 '22 at 05:40
A solution seems to be to set up a proxy. Is that possible within PyScript? — AbsoluteBeginner, Jul 29 '22 at 16:07
No, it is not possible to set up a proxy with PyScript. You can set up a proxy outside the browser environment. — John Hanley, Jul 29 '22 at 20:11

Using WordNet with PyScript

0 Answers0

Linked