-1

I'm really new to python and programming in general, and to practice I'm doing projects where I try to tackle problems from my day to day work, so please excuse me if this may be a silly question. I'm trying to combine a group of files located on a remote folder into a single monthly one based on the date, I've already combined files based on date so I think I can do that, but I'm having trouble with the regex to pick the date from the file name string, the string with the filepath is as follows

\\machinename123\main folder\subfolder\2021-01-24.csv

The file name will always have the same format since it's and automated process, only changing the date on the name, I was trying to pick the date from this string using a regex to select the text between the last \ of the string and the . from the format, so I can get a 2021-01-24 as a result but at the level I'm at, regex are like witchcraft and I don't really know what I'm doing, I've been trying for a few hours to no success, so far this is the closest I can get by trial and error (?:[0-9\-]) but this selects all the numbers on the string, including the ones on the machine name, besides the issue of not knowing why it works the way it works (for example I know that the ?: works by testing, but I don't understand the theory behind it so I couldn't replicate it in the future).

How can I make it ignore the other numbers on the string, or more specifically pick only the text between the last \ and the . from the csv, xlsx or whatever the format is?

I'd like the former option better, since it would allow me to learn how to make it do what I need it to do and not get the result by coincidence.

Thanks for any help

Jhilbran
  • 31
  • 3

2 Answers2

0

Use re.search() to find a pattern: <4 digits>-<2 digits>-<2 digits>.

s = r'\\machinename123\main folder\subfolder\2021-01-24.csv'
m = re.search(r'\d{4}-\d{2}-\d{2}', s).group(0)
Michael
  • 5,095
  • 2
  • 13
  • 35
0

You can use the following regex, that summarizes the structure of your full path:

import re 

filename_regex = re.compile(r'^\\\\[(?:\w| )+\\]+((?:\d{4,4})-(?:\d{2,2})-(?:\d{2,2})).csv$')

m = filename_regex.match(r"\\machinename123\main folder\subfolder\2021-01-24.csv")
if m is not None:
    print(f'File found with date: {m.groups()[0]}')
else:
    print('Filename not of interest')

The output will be:

File found with date: 2021-01-24

filename_regex accepts string starting with \\, followed by a repetition of characters (alphanumeric and underscores) and spaces followed by \, but with the final part corresponding to 4 digits, followed by a minus, then 2 digits, a minus again, 2 digits and the string .csv. The regular expression used here to match the date is very simple, but you can use a more complex one if you prefer.

Another simpler approach would be using ntpath library, extracting the name of the file from the full path and applying the regular expression only to the name of the file:

import ntpath
import re

filename_regex = re.compile(r'^((?:\d{4,4})-(?:\d{2,2})-(?:\d{2,2})).csv$')

filename = ntpath.basename(r"\\machinename123\main folder\subfolder\2021-01-24.csv")
m = filename_regex.match(filename)

m here will have the same value as before.

PieCot
  • 3,564
  • 1
  • 12
  • 20