0

I have the following directory structure on my file system:

/home/myUser/
    stuff_home/
        fizz/
            a.txt
            b.txt
        buzz/
            1.pdf
        widgets/
            c.txt
            2.pdf
            3.pdf
            4.pdf

I want to traverse stuff_home/ recursively and count the number of subdirectories, .txt files and .pdf documents it contains. I have written a small Python script:

import os

dirCnt = 0
txtCnt = 0
pdfCnt = 0

def main():
    get_counts("/home/myUser/stuff_home")

    t = str(txtCnt)
    p = str(pdfCnt)
    d = str(dirCnt)
    print "\nRESULTS\Text Files:\t" + t + "\nPDF Files:\t" + p + "\nDirectories:\t" + d + "\n\n"

def get_counts(root):
    contents = os.listdir(root)

    for file in contents:
        if os.path.isdir(file):
            dirCnt = dirCnt + 1
        elif os.path.splitext(file)[1] == "txt":
            txtCnt = txtCnt + 1
        elif os.path.splitext(file)[1] == "pdf":
            pdfCnt = pdfCnt + 1
        else:
            print "Encountered unknown file: " + file

When I run this, I get no errors, but the script is clearly coded wrong. Here is the output I get:

Encountered unkown file: fizz
Encountered unkown file: buzz
Encountered unkown file: widgets

RESULTS
Text Files:    0
PDF Files:     0
Directories:   0

Anything jump out to you Pythonians out there? It looks like none of my logic (for detecting file vs. directory, as well as using splitext to grabs the file extension) is working here...thanks in advance!

IAmYourFaja
  • 55,468
  • 181
  • 466
  • 756

1 Answers1

1

This seems like a job for os.walk (if I understand correctly):

def count_pdf_txt(top):
    npdf = 0
    ntxt = 0
    ndir = 0
    for root,dirs,files in os.walk(top):
        ndir += len(dirs)
        for f in files:
            if f.endswith('txt'): #use `splitext` if you like.
                ntxt += 1
            elif f.endswith('pdf'):
                npdf += 1
            else:
                print "unknown"

    return npdf,ntxt,ndirs

Note that your version gives a wrong result because of the lines like:

 pdfCount = pdfCount + 1

inside your get_counts function. This creates a new local variable which doesn't influence the global variable in any way. In order to have your local variables change the global variables, you need to declare them as global. e.g. global pdfCount. However, the presence of a global keyword in your function should always make you think "there's got to be a better way to do this".

mgilson
  • 300,191
  • 65
  • 633
  • 696