3

I have been working on a script that will check through every subdirectory in a directory and match files using regex and then use different commands based on what kind of a file it is.

So what i have finished is the use of different commands based on regex matching. Right now it checks for either a .zip file, .rar file or .r00 file and uses different commands for each match. However i need help iterating through every directory and first check if there is a .mkv file in there, then it should just pass that directory and jump to the next, but if there is a match it should run the command and then when it's finished continue to the next directory.

import os
import re

rx = '(.*zip$)|(.*rar$)|(.*r00$)'
path = "/mnt/externa/folder"

for root, dirs, files in os.walk(path):

    for file in files:
        res = re.match(rx, file)
        if res:
            if res.group(1):
                print("Unzipping ",file, "...")
                os.system("unzip " + root + "/" + file + " -d " + root)
            elif res.group(2):
                os.system("unrar e " + root + "/" + file + " " + root)
            if res.group(3):
                print("Unraring ",file, "...")
                os.system("unrar e " + root + "/" + file + " " + root)

EDIT:

Here is the code i have now:

import os
import re
from subprocess import check_call
from os.path import join

rx = '(.*zip$)|(.*rar$)|(.*r00$)'
path = "/mnt/externa/Torrents/completed/test"

for root, dirs, files in os.walk(path):
    if not any(f.endswith(".mkv") for f in files):
        found_r = False
        for file in files:
            pth = join(root, file)
            try:
                 if file.endswith(".zip"):
                    print("Unzipping ",file, "...")
                    check_call(["unzip", pth, "-d", root])
                    found_zip = True
                 elif not found_r and file.endswith((".rar",".r00")):
                     check_call(["unrar","e","-o-", pth, root,])
                     found_r = True
                     break
            except ValueError:
                print ("Oops! That did not work")

This script works mostly fine but sometimes i seem to run into issues when there are Subs in the folder, here is an error i message i get when i run the script:

$ python unrarscript.py

UNRAR 5.30 beta 2 freeware      Copyright (c) 1993-2015    Alexander Roshal


Extracting from /mnt/externa/Torrents/completed/test/The.Conjuring.2013.1080p.BluRay.x264-ALLiANCE/Subs/the.conjuring.2013.1080p.bluray.x264-alliance.subs.rar

No files to extract
Traceback (most recent call last):
  File "unrarscript.py", line 19, in <module>
    check_call(["unrar","e","-o-", pth, root])
  File "/usr/lib/python2.7/subprocess.py", line 541, in     check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['unrar', 'e', '-o-', '/mnt/externa/Torrents/completed/test/The.Conjuring.2013.1080p.BluRay.x264-ALLiANCE/Subs/the.conjuring.2013.1080p.bluray.x264-alliance.subs.rar', '/mnt/externa/Torrents/completed/test/The.Conjuring.2013.1080p.BluRay.x264-ALLiANCE/Subs']' returned non-zero exit status 10

I cannot really understand what is wrong about the code, so what im hoping is that some of you are willing to help me.

nillenilsson
  • 473
  • 1
  • 4
  • 14

4 Answers4

2

Just use any to see if any files end in .mkv before going any further, you can also simplify to an if/else as you do the same thing for the last two matches. Also using subprocess.check_call would be a better approach:

import os
import re
from subprocess import check_call
from os.path import join

rx = '(.*zip$)|(.*rar$)|(.*r00$)'
path = "/mnt/externa/folder"


for root, dirs, files in os.walk(path):
    if not any(f.endswith(".mkv") for f in files):
        for file in files:
            res = re.match(rx, file)
            if res:
                # use os.path.join 
                pth = join(root, file)
                # it can only be res.group(1) or  one of the other two so we only need if/else. 
                if res.group(1): 
                    print("Unzipping ",file, "...")
                    check_call(["unzip" , pth, "-d", root])
                else:
                    check_call(["unrar","e", pth,  root])

You could also forget the rex and just use an if/elif and str.endswith:

for root, dirs, files in os.walk(path):
    if not any(f.endswith(".mkv") for f in files):
        for file in files:
            pth = join(root, file)
            if file.endswith("zip"):
                print("Unzipping ",file, "...")
                check_call(["unzip" , pth, "-d", root])
            elif file.endswith((".rar",".r00")):
                check_call(["unrar","e", pth,  root])

if you really care about not repeating steps and speed, you can filter as you iterate you can collect by extension by slicing as you do the check for the .mkv and use for/else logic:

good = {"rar", "zip", "r00"}
for root, dirs, files in os.walk(path):
    if not any(f.endswith(".mkv") for f in files):
        tmp = {"rar": [], "zip": []}
        for file in files:
            ext = file[-4:]
            if ext == ".mkv":
                break
            elif ext in good:
                tmp[ext].append(join(root, file))
        else:
            for p in tmp.get(".zip", []):
                print("Unzipping ", p, "...")
                check_call(["unzip", p, "-d", root])
            for p in tmp.get(".rar", []):
                check_call(["unrar", "e", p, root])

That will short circuit on any match for a .mkv or else only iterate over any matches for .rar or .r00 but unless you really care about efficiency I would use the second logic.

To avoid overwriting you can unrar/unzip each to a new subdirectory using a counter to help create a new dir name:

from itertools import count


for root, dirs, files in os.walk(path):
        if not any(f.endswith(".mkv") for f in files):
            counter = count()
            for file in files:
                pth = join(root, file)
                if file.endswith("zip"):
                    p = join(root, "sub_{}".format(next(counter)))
                    os.mkdir(p)
                    print("Unzipping ",file, "...")
                    check_call(["unzip" , pth, "-d", p])
                elif file.endswith((".rar",".r00")):
                    p = join(root, "sub_{}".format(next(counter)))
                    os.mkdir(p)
                    check_call(["unrar","e", pth,  p])

Each will be unpacked into a new directory under root i.e root_path/sub_1 etc..

You probably would have been better adding an example to your question but if the real problem is you only want one of .rar or .r00 then you can set a flag when you find any match for the .rar or .r00 and only unpack if the flag is not set:

for root, dirs, files in os.walk(path):
    if not any(f.endswith(".mkv") for f in files):
        found_r = False
        for file in files:
            pth = join(root, file)
            if file.endswith("zip"):
                print("Unzipping ",file, "...")
                check_call(["unzip", pth, "-d", root])
                found_zip = True
            elif not found_r and file.endswith((".rar",".r00"))
                check_call(["unrar","e", pth,  root])
                found_r = True     

If there is also only one zip you can set two flags and leave the loop where both are set.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • I tried this and it seemed to work great, however an issue arises when there is both a .rar file and several .r00 files, script will succesfully unrar the .r00 files and then when finished it begins to extract the .rar file, but the problem is that they contain the same content so it will just want to replace the file it just unpacked. Is there a way to skip this ? – nillenilsson Sep 05 '16 at 14:32
  • @nillenilsson, yes, use separate directories for each – Padraic Cunningham Sep 05 '16 at 14:33
  • No you dont understand, the file structure in the directory is the following: file1.r00 file.r01 file.r02 ... file.r99 file.rar – nillenilsson Sep 05 '16 at 14:36
  • I do understand, create a dir that is unique for each and specify to untar to that directory – Padraic Cunningham Sep 05 '16 at 14:40
  • I ran the new script but no it's still not right, because i will get it like this: 1. the script runs and extracts .r00/.r01.. files and i get a .mkv to the correct folder, everything is great! 2. But now the script finds a .rar as well in the folder and starts unpacking but this time in a new folder and this is not what i want at all. The files are in parts. The .mkv is packaged into rar files and i dont need to extract both the .r01 and the .rar files because they contain the same file. Therefore i only need to extract one of the .r01 or .rar, i really appreciate your help! – nillenilsson Sep 05 '16 at 15:16
  • You mean one of .r00 or .rar not both? In your own code you always check for both. Also what about the zip, they are always unpacked? – Padraic Cunningham Sep 05 '16 at 15:25
  • I just realized that i dont have to check for both .r00 and .rar files, just extracting the .rar file will suffice. – nillenilsson Sep 05 '16 at 15:37
  • So there are always both in the directory? I presumed it could be 0,1 or 2, if they always both exist then yes you only need to find one. – Padraic Cunningham Sep 05 '16 at 15:37
  • Hey again, i am using your code with some slight modification to try and prevent for an error when happens when the script tries to extract the subs, could you check the main post and try to see what is wrong with the script? Because i cannot understand why it fails. – nillenilsson Sep 06 '16 at 11:50
1

The example below will work directly! As suggested by @Padraic I replaced os.system with the more suitable subprocess.

What about joining all the files in a single string and look for *.mkv within the string?

import os
import re
from subprocess import check_call
from os.path import join

rx = '(.*zip$)|(.*rar$)|(.*r00$)'
path = "/mnt/externa/folder"
regex_mkv = re.compile('.*\.mkv\,')
for root, dirs, files in os.walk(path):

    string_files = ','.join(files)+', '
    if regex_mkv.match(string_files): continue

    for file in files:
        res = re.match(rx, file)
        if res:
            # use os.path.join 
            pth = join(root, file)
            # it can only be res.group(1) or  one of the other two so we only need if/else. 
            if res.group(1): 
                print("Unzipping ",file, "...")
                check_call(["unzip" , pth, "-d", root])
            else:
                check_call(["unrar","e", pth,  root])
Riccardo Petraglia
  • 1,943
  • 1
  • 13
  • 25
  • I'm sorry but i dont understand, i realize that this makes me find the .mkv files but how do i unrar the .zip and rar files as well ? – nillenilsson Sep 05 '16 at 12:59
  • Maybe I did not really understand what you want... The snippet I proposed will the directories containing at least one file that ends with ".mkv". This is not what you want? – Riccardo Petraglia Sep 05 '16 at 13:55
  • @Padraic Cunningham Despite simpler, the solution using any require an "if" for each element on the list. Comparing over a list of 780 names, your method is ~3 times slower than using a regex. Anyway, replace os.system with subprocess is quite useful! I will edit my comment based on that. – Riccardo Petraglia Sep 05 '16 at 14:04
  • Did you time the join first? Also you regex is wrong, what happens when the last file ends in .mkv? It won't look like `foo.mkv,` so you would need to add more logic to catch that. – Padraic Cunningham Sep 05 '16 at 14:25
  • @PadraicCunningham Yes I considered the join in the timing. I correct the code to account for the bug you found... Thank you – Riccardo Petraglia Sep 05 '16 at 17:58
0

re is overkill for something like this. There's a library function for extracting file extensions, os.path.splitext. In the following example, we build an extension-to-filenames map and we use it both for checking the presence of .mkv files in constant time and for mapping each filename to the appropriate command.

Note that you can unzip files with zipfile (standard lib) and third-party packages are available for .rar files.

import os

for root, dirs, files in os.walk(path):
    ext_map = {}
    for fn in files:
        ext_map.setdefault(os.path.splitext(fn)[1], []).append(fn)
    if '.mkv' not in ext_map:
        for ext, fnames in ext_map.iteritems():
            for fn in fnames:
                if ext == ".zip":
                    os.system("unzip %s -d %s" % (fn, root))
                elif ext == ".rar" or ext == ".r00":
                    os.system("unrar %s %s" % (fn, root))
Community
  • 1
  • 1
simleo
  • 2,775
  • 22
  • 23
  • Building the dict is O(n) so you are not reducing the search time. The only way a dict would make sense is if you used dict.get to lookup the extensions in the last loop. – Padraic Cunningham Sep 05 '16 at 14:07
  • You have to iterate once to check for .mkv and twice to unpack archives. Since the dictionary is built while checking for .mkv, which you have to do anyway, it's adding nothing to the complexity. Doing a regex match for each iteration, though, likely makes it quadratic. – simleo Sep 05 '16 at 14:44
-2
import os
import re

regex = re.complile(r'(.*zip$)|(.*rar$)|(.*r00$)')
path = "/mnt/externa/folder"
for root, dirs, files in os.walk(path):
    for file in files:
        res = regex.match(file)
        if res:
           if res.group(1):
              print("Unzipping ",file, "...")
              os.system("unzip " + root + "/" + file + " -d " + root)
           elif res.group(2):
              os.system("unrar e " + root + "/" + file + " " + root)
           else:
              print("Unraring ",file, "...")
              os.system("unrar e " + root + "/" + file + " " + root)