0

I have a folder called Contracts and then in that folder I have folders for several companies. In the company folders I have several contracts that we have with those companies. I am trying to get a data frame that has two columns, Folder_Name and Contract.

I tried to follow this question, Python list directory, subdirectory, and files which got me close, I think, but I could not get a column with the folder name that the contract was from.

I thought this would work:

import pathlib, sys, os
import pandas as pd

cwd = os.getcwd()

lst1 = []
lst2 = []
for path, subdir, file in os.walk(cwd):
    for i in subdir:
        for name in file:
            lst1.append(i)
            lst2.append(name)
        
df = pd.DataFrame(zip(lst1, lst2), columns = ['Folder_Name', 'Contract'])

but it only gave me the folder names in one column and the names of files in the contracts folder instead of in the company folders

    Folder_Name         Contract
0   .ipynb_checkpoints  Untitled.ipynb
1   AWS                 Untitled.ipynb

1 Answers1

1

I ran this code:

import pathlib, sys, os
import pandas as pd

cwd = os.getcwd()

lst1 = []
lst2 = []
for path, subdir, file in os.walk(os.path.join(cwd,'Contracts')):
    print(path, subdir, file)
    for i in subdir:
        for name in file:
            print(i,name)

In an exemple folder and I found your problem.

Here is the console response

As you can see when subdir is full, file is empty and when file is full, subdir is empty.

In fact, subdir lists the forward folders whereas file only lists you the forward files considering to the path you are in, regarding to your situation there is either one or another, but never both at the same time. That's why your loop always has an empty element and never prints anything.

I tryed to do something which works in the situation you described, this is a ltle bit longer but you can try that:

import pathlib, sys, os
import pandas as pd

cwd = os.getcwd()
contracts_path=os.path.join(cwd,'Contracts')

lst1 = []
lst2 = []
for path, subdir, file in os.walk(contracts_path):
    for company in subdir:
        for path, subdir, file in os.walk(os.path.join(contracts_path,company)):
            for name in file:
                lst1.append(company)
                lst2.append(name)
    
df = pd.DataFrame(zip(lst1, lst2), columns = ['Folder_Name', 'Contract'])