1

Currently my script looks for ESC in a specific .xlsx file name and gets the last characters after that which in my case is the date. The file looks like this: xxx_2392469513_1700001_ESC_2020_01.xlsx

filenames = os.listdir(os.path.join(path, path2, path3, path4))
for filename in filenames:
    getdate = re.search('(?<=ESC_)\w+', filename)

    #Replace '_' with '-'
    if getdate:
        date = getdate.group(0).replace('_', '-')
        print('The following ESC file has date', date)

"The following ESC file has date 2020-01"

With this I get the date. However, I noticed that not every filename has the date after ESC i.e. xxx_2392469513_1700001_ESC_something_2020_01.xlsx. But it is crucial for me to only check the filename with an ESC in it.

How can I get the last 7 characters of that filename with re.search?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Trunks
  • 85
  • 6

2 Answers2

1

is regex requirement, is it a school task? simple string slicing [-7:] gives you last 7 chars, if you need only those with ESC in - do filtering:

filenames = ['ESCdasdsadasd', 'yrfgreufre', 'dsfdESCfdgdf']
for filename in filenames:
  if 'ESC' in filename:
    print(filename[-7:])

this will print last 7 chars of strings that contain ESC

Drako
  • 773
  • 10
  • 22
  • With filename `xxx_2392469513_1700001_ESC_2020_01.xlsx` this gives `01.xlsx`, but op want `2020_01`, they don't want the extension. – Toto Mar 10 '20 at 11:25
  • @Toto its easy to acquire file names without extension in python and its different question - I showed a method to extract last 7 characters and filter on presence of ESC that was the issue. if necessary he can ask different question how to get file name without extension, but its already answered on SO here: https://stackoverflow.com/questions/678236/how-to-get-the-filename-without-the-extension-from-a-path-in-python – Drako Mar 10 '20 at 11:43
  • @Toto you could also cut it with slicing if all extensions the same lengths, but thats dirty approach ofcourse – Drako Mar 10 '20 at 11:45
  • Then, you don't answer the question. – Toto Mar 10 '20 at 12:25
1

If you want to fix your current regex appproach you may use

filenames = os.listdir(os.path.join(path, path2, path3, path4))
for filename in filenames:
    getdate = re.search('ESC_.*(.{7})\.[^.]+$', filename)
    if getdate:
        date = getdate.group(1).replace('_', '-')
        print('The following ESC file - {} - has {} date'.format(filename, date))
    else:
        print('No date found in {}'.format(filanme))

The ESC_.*(.{7})\.[^.]+$ pattern matches

  • ESC_ - an ESC_ string
  • .* - any 0+ chars other than line break chars as many as possible
  • (.{7}) - Capturing group 1: any seven chars other than line break chars
  • \. - a dot
  • [^.]+ - 1+ chars other than a dot
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563