0

I am trying to get the paths of a group of files that I have in a list. The files are in different subfolders. I am using os.walk and loops to run through the different files and appending the complete path to a new dataframe to use in a different program. But there is an error in the code that only makes it run the first cycle of the loop.

the code is based on this thread: Need the path for particular files using os.walk()

I am using python3.6 on MacOS10.14.6 I am not sure if it matters but the directories are on an external hard drive.

    import pandas as pd
    import os

    dir = "/Volumes/dir1/dir2"
    fastafiles = ["file1", "file2", "file3"]
    fastafiles_df = pd.DataFrame(fastafiles)

    fasta_paths = []

    for fasta in fastafiles_df[0]:
        #1
        for dir, subdirs, files in os.walk(dir):
            for file in files:
                if file.endswith(fasta):
                    #2
                    fasta_paths.append(os.path.join(dir, file))
                    #3

Running the code will give me 1 entry in fasta_paths with only the path of the first file.

If I print(fasta) at #1 I get all 3 file names from my dataframe.

If I print(file) at #2 I will get only 1 file name and if I print fasta_paths at #3 I will get the path of the first file.

Could someone point out why the loop does not continue.

shuberman
  • 1,416
  • 6
  • 21
  • 38
newbie_here
  • 39
  • 1
  • 6

1 Answers1

1

I guess the error is because of the namespace collapse due to dir being is used to denote the root directory as well as the parameter while unzipping and iterating over the os.walk generator object. Still, keep in mind that dir is the name of a Python standard library and try not to use such variable names.

Can you try with the variable names changed?

import pandas as pd
import os

root_dir = "/Volumes/dir1/dir2"
fastafiles = ["file1", "file2", "file3"]
fastafiles_df = pd.DataFrame(fastafiles)

fasta_paths = []

for fasta in fastafiles_df[0]:
    #1
    for curr_dir, subdirs, files in os.walk(root_dir):
        for file in files:
            if file.endswith(fasta):
                #2
                fasta_paths.append(os.path.join(curr_dir, file))
Nidhin Bose J.
  • 1,092
  • 15
  • 28