0

I'm currently self-learning Python and I'm in process of writing my first shell-script. It is a linux file search shell-script, with duplicate files recognization with 'md5hash'. It is solely made for learning purposes, not for real project.

Here's my code:

from subprocess import Popen, PIPE
import os
def index(directory):
    stack = [directory]
    files = []
    while stack:
        directory = stack.pop()
        for file in os.listdir(directory):
            fullname = os.path.join(directory, file)
            if search_term in fullname:
                files.append(fullname)
            if os.path.isdir(fullname) and not os.path.islink(fullname):
                stack.append(fullname)
    return files

from collections import defaultdict

def check(directory):
    files = index(directory)
    if len(files) < 1:
        print("No file(s) meets your search criteria")
    else:
        print ("List of files that match your criteria:")
        for x in files:
            print (x)
        print ("-----------------------------------------------------------------")
    values = []
    for x in files:
        cmd = ['md5sum', x]
        proc = Popen(cmd, stdout=PIPE)
        (out, err) = proc.communicate()
        a = out.split(' ', 1)
        values.append(a[0])
    proc.stdout.close()
    stat = os.waitpid(proc.pid, 0)
    D = defaultdict(list)
    for i,item in enumerate(values):
        D[item].append(i)
    D = {k:v for k,v in D.items() if len(v)>1}
    for x in D:
        if len(D[x]) > 1:
            print ("File", files[D[x][0]], "is same file(s) as:")
            for y in range(1, len(D[x])):
                print (files[D[x][y]]) 

search_term = input('Enter a (part of) file name for search:')
a = input('Where to look for a file? (enter full path)')
check(a)

My questions regarding the code:

1. I've been advised to replace deprecated os.popen() with subprocess.Popen()

Yet I don't have a clue how to do it. I tried several solutions that I found already present here on stackoverflow but none seems to work with my case, and every produces some kind of error. For example, dealing with it like this:

from subprocess import Popen, PIPE
...
cmd = ['md5sum', f]
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
proc.stdout.close()
stat = os.waitpid(proc.pid, 0)

I'm getting the NameError: global name 'subprocess' is not defined error.

I'm really lost in this one, so any help provided is appreciated.

2. How to make this program able to search from the top (root)?

If I enter the "/" for the search path, I get the PermissionError: [Errno 1] Operation not permitted: '/proc/1871/map_files' Does my script need sudo privilegies?

I'm trying to learn Python all for myself by using the Internet. Thanks for your help!

Reloader
  • 742
  • 11
  • 22

1 Answers1

2

1. If you use the from module import variable syntax, you can access variable directly, in this case:

from subprocess import Popen, PIPE
proc = Popen(cmd, stdout=PIPE)

If you use the import module syntax, you need to add the module name (as you do in your code):

import subprocess
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)

For further information about imports, I recommend the article Understanding imports and PYTHONPATH.

2. Some files on your file system can only be read as root, for example some files in the /proc/ directory. To read them, your Python script needs root access, for example via sudo.

Robin Krahl
  • 5,268
  • 19
  • 32
  • In this case I need to split result of readline() function, so I store it like this in variable: `res = proc.stdout.readline()` but when I try to `a = res.split(' ', 1)` I get the `TypeError: Type str doesn't support the buffer API` . Can you help me with that? – Reloader Oct 24 '13 at 20:05
  • The reason for this error is rather complex. I found a simple solution that should work for you, see: http://stackoverflow.com/questions/7468668/python-subprocess-readlines. After calling `(out, err) = proc.communicate()`, you should be able to access the command output in `out`. – Robin Krahl Oct 24 '13 at 20:10
  • Still getting the same error code... I must be doing something wrong. I've tried also with the solution provided in question you have linked me, but the same error occurs. I've edited the original code for you to see... – Reloader Oct 24 '13 at 20:40
  • What happens if you test the subprocess within the interactive interpreter? It works for me: http://pastebin.com/M0MipK3H – Robin Krahl Oct 24 '13 at 20:46
  • 1
    Ah, this is a Python **3** specific behaviour. I cannot explain it, but it is because `out` is a byte string (`b'...'`). If you convert it to a regular string (`str(out)`), it should work. – Robin Krahl Oct 24 '13 at 23:31
  • I was just trying to 'pickle load' that string of bytes, but simple converting to str is just what I needed. Thanks! ;) – Reloader Oct 24 '13 at 23:44