Revisit "How to find the position/index of a particular file in a directory?"

Question

I have a question from the following discussion:

How to find the position/index of a particular file in a directory?

Suppose I have three excel files in a folder: test_3d, test_3d1, test_3d2

It says we can read the index of a file from the following codes

folder = r'C:\Users\Denny\Desktop\Work\test_read'
files = os.listdir(folder)
files.index('test_3d1.xlsx')

>> 1

Also, we can read the data of each file by

folder = r'C:\Users\Denny\Desktop\Work\test_read'
files = os.listdir(folder)
dfs = {}
for file in files:
    if file.endswith('.xlsx'):
        dfs[file[:-5]] = pd.read_excel(os.path.join(folder,file), header = None, skiprows=[0], usecols = "B:M")

dfs['test_3d1']

Also, we can show all its files by using

files
>> ['test_3d.xlsx', 'test_3d1.xlsx', 'test_3d2.xlsx']

My question now is how to get the data of each file not by its name

dfs['test_3d1']

but by its index, for example

dfs['files[1]']   # I want to pick up the 2nd file 'text_3d1' from files.

However, it shows an error

How to fix this error?

score 2 · Answer 1 · answered Sep 11 '21 at 16:07

If you wanted to look values in the dictionary as per the screenshot you posted, you could do: dfs[files[1][:-5]]. This gets the file at index 1 and then excludes the file extension as you've done in in the step to build the dfs dictionary.

Optionally, you could use the __missing__ method available for dictionaries to change the behaviour of how keys which aren't present in the dictionary are handled. Using the recipe from this answer, you could use a lookup on the dictionary values() or modify the key to remove the file extension and then return the value for that key. So you can use it with dfs[files[1]] without needing to strip off the extension each time.

In [1]: class smart_dict(dict):
   ...:     def __missing__(self, key):
   ...:         if isinstance(key, int):             # skip this if you don't plan
   ...:             return list(self.values())[key]  # to use ints directly
   ...:             # or
   ...:             # return self[list(self.keys())[key]]
   ...:         if key.endswith('.xlsx'):
   ...:             return self[key[:-5]]
   ...:         raise KeyError(key)
   ...:

In [2]: dfs = smart_dict()
   ...: dfs['a'] = 'A'
   ...: dfs['b'] = 'B'

In [3]: dfs['a']  # normal usage
Out[3]: 'A'

In [4]: dfs[0]  # index-based lookup
Out[4]: 'A'

In [5]: dfs[1]
Out[5]: 'B'

In [6]: dfs['a.xlsx']  # lookup with the filename
Out[6]: 'A'

In [7]: dfs['does not exist']  # still raises KeyError
...
KeyError: 'does not exist'

In [8]:  dfs['nope.xlsx']  # also raises KeyError
...
KeyError: 'nope'

Btw, there is an overhead to constantly doing list(dict.values())[index] lookups for integer keys since values() can't indexed directly. So avoid using that. The '.xlsx' extension-removal lookup is okay since it merely removes the extension and then directly uses that as a key.

Ofek Glick · Accepted Answer · 2021-09-11T14:00:43.070

Notice the object types you are working with, in this piece of code:

folder = r'C:\Users\Denny\Desktop\Work\test_read'
files = os.listdir(folder)
dfs = {}
for file in files:
    if file.endswith('.xlsx'):
        dfs[file[:-5]] = pd.read_excel(os.path.join(folder,file), header = None, skiprows=[0], usecols = "B:M")

dfs['test_3d1']

Notice that dfs is a dictionary, so we accessed the file by it's key, and here we defined the key as the file of the name.
If you wish to change that into it's index then store it as such like so:

folder = r'C:\Users\Denny\Desktop\Work\test_read'
files = os.listdir(folder)
dfs = {}
# Here is the change
for i,file in enumerate(files):
    if file.endswith('.xlsx'):
        dfs[i] = pd.read_excel(os.path.join(folder,file), header = None, skiprows=[0], usecols = "B:M")

Revisit "How to find the position/index of a particular file in a directory?"

2 Answers2