To download the data related to questions and answers, I am following the script on facebook/ELI5.
There it says to run the command: python download_reddit_qalist.py -Q
. On running this command, I get an error on line number 70 in python file 'download_reddit_qalist.py', where the zstandardDecompressor object is enumerated. The error log says that:
zstd.ZstdError: Zstd decompress error: Frame requires too much memory for decoding
Thinking the memory issue, I allocated 32 gb memory to the container along with 8 CPUs. But the error stays.
When I replaced the enumerate function with ElementTree.iterparse(), then along with this error, another message adds up:
for i, l in ET.iterparse(f):
File "/anaconda3/lib/python3.8/xml/etree/ElementTree.py", line 1229, in iterator
data = source.read(100 * 2048)
zstd.ZstdError: zstd decompress error: Frame requires too much memory for decoding
Does anyone face the similar error? I have the docker container running on the slurm cluster. If you need more information let me know.