0

I have a list of files that are arranged in the following format:

'folder/sensor_01/2021/12/31/005_6_0.csv.gz', 
'folder/sensor_01/2022/01/01/005_0_0.csv.gz', 
'folder/sensor_01/2022/01/02/005_1_0.csv.gz', 
'folder/sensor_01/2022/01/03/005_4_0.csv.gz',
....

Now, what I want to do is filter the entries which are within the time range. So, in the folder listings, the middle segment after sensor_01 and before 005 give the time entry (till date resolution).

I am getting stuck with how to extract this time segment from the folder path and convert it to a python DateTime object. I think I can then use the comparison operators to filter the entries.

Luca
  • 10,458
  • 24
  • 107
  • 234
  • Use regex. Take all of these as strings and do a regex match on them with grouping for what you want. Once you have a grouping, change tat to a date or datetime object – leoOrion May 30 '22 at 10:06

2 Answers2

1

The answer is the string to DateTime formatting.

Split

You can split the text to get the Year, Month, and Day part.

file = 'folder/sensor_01/2021/12/31/005_6_0.csv.gz'
file.split("/")
# ['folder', 'sensor_01', '2021', '12', '31', '005_6_0.csv.gz']

Here 2nd, 3rd and 4th elements are year, month and day.

Or

strptime

See https://stackoverflow.com/a/466376/2681662. You can create a DateTime object from a string. But there's no restriction of delimiters for the Year, Month, and Day separator. So:

file = 'folder/sensor_01/2021/12/31/005_6_0.csv.gz'
datetime.strptime(file, 'folder/sensor_01/%Y/%m/%d/005_6_0.csv.gz') # This is valid
# datetime.datetime(2021, 12, 31, 0, 0)
MSH
  • 1,743
  • 2
  • 14
  • 22
1

This is easily done using regex.

\S+\/sensor_[\d]+\/([\S\/]+)\/[\S_]+\.csv\.gz

I have used this regex to match and group the date portion of one of the strings.

In [11]: import re

In [12]: string = 'folder/sensor_01/2021/12/31/005_6_0.csv.gz'

In [13]: reg = '\S+\/sensor_[\d]+\/([\S\/]+)\/[\S_]+\.csv\.gz'

In [15]: re.match(reg, string).groups()[0]
Out[15]: '2021/12/31'

leoOrion
  • 1,833
  • 2
  • 26
  • 52