I am attempting to write a Python script to download and unzip hundreds of files from an AWS server. As I understand it, these tasks are I/O-bound tasks, so I would like to multi-thread this task to speed up processing times.
Since I am new to Python, I've been reading guides like this one and that one on multithreading and multiprocessing.
Both of the above links suggest code to import methods from the subprocess
library, but I am running into trouble completing these imports. The second link above suggests the following code to illustrate multithreading:
from multiprocessing import Pool as ProcessPool
from urllib.request import urlopen
def run_tasks(function, args, pool, chunk_size=None):
results = pool.map(function, args, chunk_size)
return results
def work(n):
with urlopen("https://www.google.com/#{n}") as f:
contents = f.read(32)
return contents
if __name__ == '__main__':
numbers = [x for x in range(1,100)]
# Run the task using a thread pool
t_p = ThreadPool()
result = run_tasks(work, numbers, t_p)
print (result)
t_p.close()
When I tried running this script, I got the following error with traceback:
PS C:\Users\USERNAME> & "C:/Users/USERNAME/AppData/Local/Continuum/anaconda3/python.exe" "h:/Post-Processing/API Query/Python Test/subprocess_test/subprocess.py"
Traceback (most recent call last):
File "h:/Post-Processing/API Query/Python Test/subprocess_test/subprocess.py", line 38, in <module>
t_p = ThreadPool()
File "C:\Users\USERNAME\AppData\Local\Continuum\anaconda3\lib\multiprocessing\dummy\__init__.py", line 123, in Pool
from ..pool import ThreadPool
File "C:\Users\USERNAME\AppData\Local\Continuum\anaconda3\lib\multiprocessing\pool.py", line 26, in <module>
from . import util
File "C:\Users\USERNAME\AppData\Local\Continuum\anaconda3\lib\multiprocessing\util.py", line 17, in <module>
from subprocess import _args_from_interpreter_flags
ImportError: cannot import name '_args_from_interpreter_flags' from 'subprocess' (h:\PSO Post-Processing\API Query\Python Test\subprocess_test\subprocess.py)
I found this SO thread, in which the answer suggests adding
from subprocess import _args_from_interpreter_flags
to the list of imports. However, when I added this line, the import error seems to shift into my current script:
Traceback (most recent call last):
File "h:/Post-Processing/API Query/Python Test/subprocess_test/subprocess.py", line 20, in <module>
from subprocess import _args_from_interpreter_flags
File "h:\Post-Processing\API Query\Python Test\subprocess_test\subprocess.py", line 20, in <module>
from subprocess import _args_from_interpreter_flags
ImportError: cannot import name '_args_from_interpreter_flags' from 'subprocess' (h:\PSO Post-Processing\API Query\Python Test\subprocess_test\subprocess.py)
I am now suspecting that something is wrong with my Python installation, but I am not sure how to troubleshoot it.
I am running Windows 10 on a work computer and using Visual Studio Code as my editor. According to Visual Studio Code, I'm running Python 3.7.6 64-bit ('Continuum': virtualenv)
. I found that I have subprocess.py
installed at
"C:\Users\USER\AppData\Local\Continuum\anaconda3\Lib\subprocess.py"
and this subprocess.py
file indeed has a segment with
def _args_from_interpreter_flags():
"""Return a list of command-line arguments reproducing the current
settings in sys.flags, sys.warnoptions and sys._xoptions."""
flag_opt_map = {
'debug': 'd',
# 'inspect': 'i',
# 'interactive': 'i',
'dont_write_bytecode': 'B',
'no_site': 'S',
'verbose': 'v',
'bytes_warning': 'b',
'quiet': 'q',
# -O is handled in _optim_args_from_interpreter_flags()
}
args = _optim_args_from_interpreter_flags()
for flag, opt in flag_opt_map.items():
v = getattr(sys.flags, flag)
if v > 0:
args.append('-' + opt * v)
if sys.flags.isolated:
args.append('-I')
else:
if sys.flags.ignore_environment:
args.append('-E')
if sys.flags.no_user_site:
args.append('-s')
# -W options
warnopts = sys.warnoptions[:]
bytes_warning = sys.flags.bytes_warning
xoptions = getattr(sys, '_xoptions', {})
dev_mode = ('dev' in xoptions)
if bytes_warning > 1:
warnopts.remove("error::BytesWarning")
elif bytes_warning:
warnopts.remove("default::BytesWarning")
if dev_mode:
warnopts.remove('default')
for opt in warnopts:
args.append('-W' + opt)
# -X options
if dev_mode:
args.extend(('-X', 'dev'))
for opt in ('faulthandler', 'tracemalloc', 'importtime',
'showalloccount', 'showrefcount', 'utf8'):
if opt in xoptions:
value = xoptions[opt]
if value is True:
arg = opt
else:
arg = '%s=%s' % (opt, value)
args.extend(('-X', arg))
return args
Given all this information, I am sure that I'm missing a simple detail that's stopping the threading code from working. I appreciate any help you can give.
Thank you!!