0

I have this script that makes manipulations on a directory, it maps the directory and sub-directories and gives me a list that every item in the list is [path, md5].

when I do it to one directory it works fine, but when i run it on two directories, the seconds list contains all the first list data.

this is the source code:

import os 
import hashlib
import platform
import sys
import argparse
import HTML


class Map(object):

    #init function
    def __init__(self,param,out_path):
        self.param_list = param
        self.slash = self.slash_by_os()
        self.result_list = []
        self.os = ""
        self.index = 0
        self.html_out_path = out_path

    def export_to_HTML(self):
        html_code = HTML.table(self.result_list,header_row=self.param_list)
        f = open(self.html_out_path,'w')
        f.write(html_code + "<p>\n")
        f.close()


    def calc_md5(self,file_path):
        hash = hashlib.md5()
        with open(file_path, 'rb') as file_to_check:
            for chunk in iter(lambda: file_to_check.read(4096), ''):    
                hash.update(chunk)

        return hash.hexdigest()

    def slash_by_os(self):
        general_id = platform.system()
        actual_os = ""

        if general_id == "Darwin" or general_id == "darwin":
            actual_os = "UNIX"
        elif general_id == "Linux" or general_id == "linux":
            actual_os = "UNIX"
        elif general_id  == "SunOS":
            actual_os = "UNIX"
        elif general_id == "Windows" or general_id == "windows":
            actual_os = "WIN"
        else:
            actual_os = general_id

        if actual_os == "UNIX":
            return '/'
        elif actual_os == "WIN":
            return '\\'
        else:
            return '/'

        self.os = actual_os

    def what_to_do(self,new_dir):
        act = []
        act.append(new_dir[:-1])
        for param in self.param_list:
            if param == "md5":
                x = self.calc_md5(new_dir[:-1])
                act.append(x)   

        return act

    def list_of_files(self ,dir_name ,traversed = [], results = []): 

        dirs = os.listdir(dir_name)
        if dirs:
            for f in dirs:
                new_dir = dir_name + f + self.slash
                if os.path.isdir(new_dir) and new_dir not in traversed:
                    traversed.append(new_dir)
                    self.list_of_files(new_dir, traversed, results)
                else:
                    try:
                        act = self.what_to_do(new_dir)
                        results.append(act)
                    except Exception as e :
                        print "%s excepted %s, couldent read" % (new_dir,e)
        self.result_list = results
        return results

def parse_args():
    desc = "DirMap 1.0"
    parser = argparse.ArgumentParser(description=desc)
    parser.add_argument('-p1','--ogpath', help='Path To Original Directory', required=True)
    parser.add_argument('-p2','--modpath', help='Path To Modified Directory', required=True)


    args = vars(parser.parse_args())


    params = ['path','md5']
    return args,params


def main():
    args , params = parse_args() 

    og_dir_path = args['ogpath']
    og_map = Map(params,"og.html")
    og_list = og_map.list_of_files(og_dir_path)
    og_map.export_to_HTML()

    mod_dir_path =args['modpath']
    mod_map = Map(params,"mod.html")
    mod_list = mod_map.list_of_files(mod_dir_path)
    mod_map.export_to_HTML()



main() 

Any Help ???

Fernando Retimo
  • 1,003
  • 3
  • 13
  • 25
  • As a starter, I've cleaned up the clumsy formatting in the code + removed the slash logic because there's the built in `os.path.sep` (actually it's even better to use `os.path.join`). – Erik Kaplun Sep 01 '13 at 13:22
  • Great, thanks my friend, But that was the least for me to handle right now. The main problem for me is that on the second listing it 'remembers' the first list and addes it ... Hope you can figure out what im doing wrong. – Fernando Retimo Sep 01 '13 at 13:29

1 Answers1

3

The reason is that you're using mutable items as defaults in your definition of list_of_files:

def list_of_files(self ,dir_name ,traversed = [], results = []): 

These are allocated as the script is parsed, and then the same list object for results is used every time you do results.append. The solution is to use a sentinel value instead:

def list_of_files(self, dir_name, traversed=None, results=None): 
    if traversed is None:
        traversed = []
    if results is None:
        results = []
    # rest of your method...

There's a fairly good explanation of why this is here. This is a very common issue that people come up against when using Python, until they know about it!

Ben
  • 6,687
  • 2
  • 33
  • 46
  • @AdamLewis thanks - I wasn't quite sure where to find the best explanation...! – Ben Sep 01 '13 at 13:46
  • Wow , my friend, you've made my Week! I'de like you again to explain why that happend to me . Thanks! – Fernando Retimo Sep 01 '13 at 13:47
  • 1
    @FernandoRetimo read the two links (mine in the answer, Adam's in his comment) - they'll give you a good understanding of what's going on, and why it doesn't appear to work 'correctly'. – Ben Sep 01 '13 at 13:48