0

I am trying to extract the file path from files stored in a directory. I am trying to only extract the first file in the directory and there by store it to a Dataframe.

I have the list of all directories in a list and would like to have that run over and fetch just the first file name.

list = [path1,path2,path3]

I have the below script that is able to fetch the path of all files in a particular directory.

list = bucket.list(prefix="path1")
for l in list:
    keyString = str(l.key)
    print(keyString)

The above code fetches path of all files in a single directory. I am trying to see how can I have the path info passed from the list and then iterate through each path and have the path of first file in each directory stored in a Dataframe.

Kevin Nash
  • 1,511
  • 3
  • 18
  • 37

1 Answers1

1

There's multiple ways of doing this (here's one).

  1. Use glob to run through the directories (paths).
  2. Extract the first file name using os.path.basename.
  3. Append the file names to a list.

You can also use glob to run through all subdirectories if you wish. See this answer.

import glob
import os
path = 'C:/git/'

list_of_filenames = []
paths = ['C:/git/test/folder1', 'C:/git/test/folder2']
for path in paths:
    file_path_names = glob.glob(path + '/**.txt')
    if file_path_names: # check if files in dir
        list_of_filenames.append(os.path.basename(file_path_names[0]))

df = pd.DataFrame(list_of_filenames, columns=['file_names'])
print(df)

          file_names
0  folder1_file1.txt
1  folder2_file1.txt
Chris
  • 1,287
  • 12
  • 31