Merge 4 downloaded files into one

Question

I currently have 4 files that I used the linux command split to break up into 50 meg files.

I'm currently trying this, but it tells me it can not find the files.

import requests
import tempfile
import os
import subprocess as sp

def download_files_from_github(path, model_name):
    if model_name == "u2net":
        part1 = tempfile.NamedTemporaryFile(delete=False)
        part2 = tempfile.NamedTemporaryFile(delete=False)
        part3 = tempfile.NamedTemporaryFile(delete=False)
        part4 = tempfile.NamedTemporaryFile(delete=False)
        try:
            part1_content = requests.get('https://github.com/nadermx/backgroundremover/raw/main/models/u2aa')
            part1.write(part1_content.content)
            part1.close()
            part2_content = requests.get('https://github.com/nadermx/backgroundremover/raw/main/models/u2ab')
            part2.write(part2_content.content)
            part2.close()
            part3_content = requests.get('https://github.com/nadermx/backgroundremover/raw/main/models/u2ac')
            part3.write(part3_content.content)
            part3.close()
            part4_content = requests.get('https://github.com/nadermx/backgroundremover/raw/main/models/u2ad')
            part4.write(part4_content.content)
            part4.close()
            stuff = sp.run('cat %s %s %s %s > %s' % (part1.name, part2.name, part3.name, part4.name, path))
            print(stuff)
        finally:
            os.remove(part1.name)
            os.remove(part2.name)
            os.remove(part3.name)
            os.remove(part4.name)


download_files_from_github('~/.u2net/u2net.pth', 'u2net')

and I get this error

$ python tests.py
Traceback (most recent call last):
  File "tests.py", line 34, in <module>
    download_files_from_github('~/.u2net/u2net.pth', 'u2net')
  File "tests.py", line 25, in download_files_from_github
    stuff = sp.run('cat %s %s %s %s > %s' % (part1.name, part2.name, part3.name, part4.name, path))
  File "/usr/lib/python3.6/subprocess.py", line 423, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'cat /tmp/tmp28877_uq /tmp/tmpx2t9s9we /tmp/tmpj4g8ahhw /tmp/tmpty1x7pjv > ~/.u2net/u2net.pth': 'cat /tmp/tmp28877_uq /tmp/tmpx2t9s9we /tmp/tmpj4g8ahhw /tmp/tmpty1x7pjv > ~/.u2net/u2net.pth'

The whole string you are passing to `subprocess.run` is interpreted as the executable name. There is no executable with that name. The quick and dirty fix could be to use `shell=True`. — mkrieger1, Oct 19 '21 at 19:05
Does this answer your question? [File not found error when launching a subprocess containing piped commands](https://stackoverflow.com/questions/24306205/file-not-found-error-when-launching-a-subprocess-containing-piped-commands) — mkrieger1, Oct 19 '21 at 19:08

score 2 · Accepted Answer · answered Oct 19 '21 at 19:12

As a user suggested, subprocess thinks you want to execute the whole command as a single thing, which fails.

A good option would be replace the subprocess.run string argument with a list:

# pass a list directly
stuff = sp.run(["cat", part1.name, part2.name, part3.name, part4.name, ">", path])

This worked for me.

score 2 · Answer 2 · answered Oct 19 '21 at 19:54

Even though the problem has already been resolved, if I had to do it, I would do something like below:

def download_files_from_github(git_paths, out_file, model_name):
    if model_name == "u2net":
        download_status = {}
        with open(out_file, 'wb') as out:
                for git_path in git_paths:
                    response = requests.get(git_path)
                    out.write(response.content)
                    download_status[git_path] = response.status_code
        return download_status

Then call the program as and capture the response status_code(this is important, because you may want to discard the output file in calling program if you get 404 status_code for one of urls or you may want to raise error in the download function itself if status_code is 404)

status = download_files_from_github(['https://github.com/nadermx/backgroundremover/raw/main/models/u2aa',
 'https://github.com/nadermx/backgroundremover/raw/main/models/u2ab',
 'https://github.com/nadermx/backgroundremover/raw/main/models/u2acc',
 'https://github.com/nadermx/backgroundremover/raw/main/models/u2ad'],
'<your consolidated out file>', 'u2net')

print(status)

Samuel · Answer 3 · 2021-10-19T19:20:50.023

1

Try changing

stuff = sp.run('cat %s %s %s %s > %s' % (part1.name, part2.name, part3.name, part4.name, path))

to

stuff = sp.run(f'cat {part1.name} {part2.name} {part3.name} {part4.name} > {path}'.split())

You need to pass a list to sp.run, not a string. What I'm doing is basically creating the execution string and then split it into a list of commands and parameters.

edited Oct 19 '21 at 19:20

answered Oct 19 '21 at 19:11

Samuel

378
1
9

Merge 4 downloaded files into one

3 Answers3