I am writing a Python crawler to clone git repositories and analyze them. I am using subprocess.call() to clone a given repository. The problem is that after only a few repositories, I get a "OSError: [Errno 12] Cannot allocate memory":
File "main.py", line 44, in main
call(["git", "clone", remote_url.strip(), os.getcwd() + '/' + DIR_NAME])
File /usr/lib/python2.7/subprocess.py", line 522, in call
return Popen(*popenargs, **kwargs).wait()
File /usr/lib/python2.7/subprocess.py", line 709, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1222, in _execute_child
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
I tried using the sh module as well as GitPython. How can I avoid this problem?
My code is as follows:
for remote_url in remote_urls:
try:
if os.path.isdir(os.getcwd() + '/' + DIR_NAME):
shutil.rmtree(os.getcwd() + '/' + DIR_NAME)
os.mkdir(DIR_NAME)
# repo_url = remote_url.replace('ssh://', '')
call(["git", "clone", remote_url.strip(), os.getcwd() + '/' + DIR_NAME])
# with sh.git.bake(_cwd=os.getcwd() + '/' + DIR_NAME) as git:
# git.clone(remote_url.strip())
print 'Pulled # ' + str(repo_count) + ' repos'
except:
traceback.print_exc()
continue