5

I am new to Python and I am using it to do some data analysis.

My problem is the following: I have a directory with many subdirectories, each one of which contains a large number of data files.

I already wrote a Python script which, when executed in one of those subdirectories, performs the data analysis and writes it on a output file. The script includes some shell commands that I call using os.system(), so I have to "be" in one of the subdirectories for it to work.

How can I write a function that automatically:

  1. Moves into the first subdirectory
  2. Executes the script
  3. Goes back to the parent directory and moves to the next subdirectory

I guess that this could be done in some way using os.walk() but I didn't really understand how it works.

PS I am aware of the existence of this post but it doesn't solve my problem.

PPS Maybe I should point out that my function does not take the directory name as argument. Actually it takes no argument.

Community
  • 1
  • 1
valerio
  • 677
  • 4
  • 12
  • 25

5 Answers5

3

os.walk should work perfectly for what you want to do. Get started with this code and you should see what you need to do:

import os
path = r'C:\mystartingpath'

for (path, dirs, files) in os.walk(path):
    print "Path:", path

    print "\nDirs:"
    for d in dirs:
        print '\t'+d

    print "\nFiles:"
    for f in files:
        print '\t'+f

    print "----"

What this code will do is show you that os.walk will iterate through all subdirectories of your chosen starting path. Once in each directory, you can get the full path to each file name by concatenating the path and the file name. For example:

path_to_intersting_file = path+'\\'+filename

# (This assumes that you saved your filename into a variable called filename)

With the full path to each file, you can perform your analysis while in the os.walk for loop. Add your analysis code so that the for loop is doing more than just printing contents.

Chris Nielsen
  • 839
  • 1
  • 16
  • 31
2

To change your working directory in Python you need:

os.chdir(your_path)

You can then recursively run your script.

Example Code:

import os

directory_to_check = "your_dir" # Which directory do you want to start with?

def my_function(directory):
      print("Listing: " + directory)
      print("\t-" + "\n\t-".join(os.listdir("."))) # List current working directory

# Get all the subdirectories of directory_to_check recursively and store them in a list:
directories = [os.path.abspath(x[0]) for x in os.walk(directory_to_check)]
directories.remove(os.path.abspath(directory_to_check)) # If you don't want your main directory included

for i in directories:
      os.chdir(i)         # Change working Directory
      my_function(i)      # Run your function

I don't know how your script works because your question is quite general, so therefore I can only give a general answer....

But I think what you need is:

  1. Get all subdirectories and store them using os.walk
  2. Change your working directory with os.chdir

os.walk alone won't work

I hope this helps! Good luck!

ant0nisk
  • 581
  • 1
  • 4
  • 17
  • But this way I get stuck in the first subdirectory at the first iteration and I get "[Errno 2] No such file or directory: subdirectory_name". It should go back in the parent directory after the function is executed... – valerio Jun 05 '16 at 17:41
  • Yes. That is why I mentioned that you need absolute paths... I updated the code so that it suits your needs :) – ant0nisk Jun 05 '16 at 17:46
  • Ok, I had to write "__file__" with quotes to make that line work (otherwise I get "name '__file__' is not defined"), but it works! Except for one thing...for some reason the absolute path to the parent directory gets included in the "directories" list. How can I avoid that? – valerio Jun 05 '16 at 18:46
  • (I meant to write "files" with the underscores but I got bold instead) – valerio Jun 05 '16 at 18:49
  • what do you use in the directory_to_check variable? If you use '.' (to indicate the current directory) then what you say happens. But, try to run your script one directory above, and use directory_to_check='your_dir' to avoid this... (If I understand the problem correctly...) – ant0nisk Jun 05 '16 at 18:53
  • I used directory_to_check="my_dir_name" (with quotes) from one directory above, as you said, but my_dir_name gets included in the list for some reason – valerio Jun 05 '16 at 18:58
  • Hmmm... I don't know why this happens to you, for me it is not included... However, you can avoid it by adding this after the 'directories = [os.path.... ]' line: directories.remove(os.path.abspath("my_dir_name")) – ant0nisk Jun 05 '16 at 19:01
  • I will add it to the code.... Please check this as a correct answer if it is what you need :) – ant0nisk Jun 05 '16 at 19:05
  • I don't really know why...Anyway thanks a lot, you solved my problem :-) – valerio Jun 05 '16 at 19:07
1

This would be done like this.

for dir in os.listdir(your_root_directory):
    yourFunction(dir)

The os.listdir method returns the list of directories in the root directory only.

The os.walk method however traverses the directories recursivelly, which makes it useful for other things and os.listdir might be better.

However, for the sake of completenes, here is a os.walk option:

for dir in next(os.walk(your_directory))[1]:
    yourFunction(dir)

Notice that the os.walk is a generator, hence the next call. The first next call, produces a tuple root, dirs, files. And the root in this case is your directory. You are only interested in dirs - the list of subdirectories, so you index [1].

SirSteel
  • 123
  • 1
  • 10
  • Maybe I should have pointed this out, but my function doesn't take the directory name as argument. Actually it takes no argument. – valerio Jun 05 '16 at 17:28
  • Well, it should not be hard to make it so that it does. Otherwise you would need to use globals, which is bad form for python. Making a function take a folder on which it operates as an argument is what modularity is for. So that you can reuse it in other occasions. – SirSteel Jun 05 '16 at 19:06
  • Why is using globals bad form for Python? – Chris Nielsen Jun 05 '16 at 19:27
  • For his application, the use of globals would be unnecessary. Also, generally globals defeat the purpose of blackbox idea behind programming. Even if doing OOP, functional programming still applies to many methods. Therefore it is a bad practice to use globals, unless in cases where you absolutely need them. But those cases are rare. They also make debugging a lot harder. – SirSteel Jun 05 '16 at 20:00
  • This link explains it better and in more depth: https://stackoverflow.com/questions/19158339/why-are-global-variables-evil – SirSteel Jun 05 '16 at 20:06
  • [os.listdir](https://docs.python.org/3/library/os.html#os.listdir) returns not a list of directories but "a list containing the names of the entries in the directory" which is a list of files and directories. – Vladislav Povorozniuc Dec 07 '22 at 20:34
0

If you want to do a certain action for every sub-folder of a folder, one way is to write a recursive function, processing each directory one at a time. I hope my example helps a little bit: http://pastebin.com/8G7JzcQ2

  • Please add the code to your answer. See [How do I format my code blocks?](http://meta.stackexchange.com/questions/22186/how-do-i-format-my-code-blocks) – Tone Jun 05 '16 at 17:40
0

I was doing something similar, cd into every subdirectory and run git commands, etc. Shortened version

import os
import pathlib
import subprocess

if __name__ == "__main__":
    # dir path of a script, subdirectories are here
    ROOT_PATH = os.getcwd()

    # all files, folders in script's directory
    for name in os.listdir(ROOT_PATH):
        dir_path = os.path.abspath(name)

        # if a subdirectory
        if os.path.isdir(dir_path):
            # cd to subdirectory
            os.chdir(dir_path)

            # could run a script subprocess.run(["python", "my_script.py"])
            # or you could run all commands here one by one
            git_log = subprocess.getoutput(['git', 'log', '-n1'])
            print(git_log + "\n")

            # move back to script's dir
            os.chdir(ROOT_PATH)
Vladislav Povorozniuc
  • 2,149
  • 25
  • 26