0

I search a few related discussions, such as Read most recent excel file from folder PYTHON however, it does not fit my requirement quite well.

Suppose I have a folder with the following .xlsx files

enter image description here

I want to read the files with name "T2xxMhz", i.e., the last 7 files.

I have the following codes

import os
import pandas as pd

folder = r'C:\Users\work'    # <--- find the folder
files = os.listdir(folder)   # <--- find files in the folder 'work'
dfs ={}
for i, file in enumerate(files):
        if file.endswith('.xlsx'):
            dfs[i] = pd.read_excel(os.path.join(folder,file), sheet_name='Z=143', header = None, skiprows=[0], usecols = "B:M")   # <--- read specific sheet with the name 'Z=143'

num = i + 1   # <--- number of files.

However in this codes, I cannot differentiate two types of file name 'PYTEST' and 'T2XXX'.

How to deal with this problem? Any suggestions and hints please!

Denny
  • 223
  • 2
  • 15

1 Answers1

1

use glob package. allows multiple usage of regexes

import glob
dir = 'path/to/files/'
flist = glob.glob(dir + 'T*Mhz*')
print(flist)
Abhishek Jain
  • 568
  • 2
  • 4
  • 15
  • I just tried your method; however, it show '[ ]'. Should I use "T*Mhz"? or just "T" since the number after T is different? Thanks! – Denny Jan 04 '22 at 08:15
  • 1
    There's a start after Mhz too. T\* - anything after T. T\*Mhz - anything after T followed by Mhz, compulsorily. T\*Mhz\* - anything after T followed by Mhz followed by anything - in this case extension of the file. – Abhishek Jain Jan 04 '22 at 08:34