0

So this has been answered here: making output of datefinder into list Unfortunately my rep is too low so I can't comment to get clarity on why it's not functioning as expected. I want to take filename strings and turn them into a list of dates so that I can then use them as a fill for the date column. All of the filenames include event dates, but they are not on the sheets themselves. The format is: CompanyNameEventLocationDDMMYYYY.xlsx

import glob
import datefinder
#get all Excel files within folder
path = r"C:\Users\me\Documents\Events\Spreadsheets\Consolidated\*.xlsx"
filename = glob.glob(path)
#get dates from filenames
event_dates = (datefinder.find_dates(f) for f in filename)
#check output
for days in event_dates:
    print(days.strftime("%Y-%m-%d %H:%M:%S"))

Instead of getting converted objects I get the following error: AttributeError:

'generator' object has no attribute 'strftime'

When I try converting it directly to a list with

date_list = list(event_dates)
print(event_dates)

I still get output saying that it's still a generator object:

<generator object <genexpr> at 0x00000230571AE660>

What do I need to change so that it actually becomes a list of datetimes that I can use to populate an appended 'Date' column by forward filling?

Naeblis
  • 54
  • 8

2 Answers2

0

Your code does not work because the find_dates returns you a generator and events date contains a list of generator. It could work with the chain.from_iterable like following

import glob
import datefinder
from itertools import chain
#get all Excel files within folder
path = r"C:\Users\me\Documents\Events\Spreadsheets\Consolidated\*.xlsx"
filename = glob.glob(path)
#get dates from filenames
event_dates = (datefinder.find_dates(f) for f in filename)
#check output
for days in chain.from_iterable(event_dates):
    print(days.strftime("%Y-%m-%d %H:%M:%S"))

But I try to use datefinder on a filename with your format and it does not work well. I think it is not designed for this.

event_dates = datefinder.find_dates('afilename_01-08-2019.xls')
for match in event_dates:
    print(match)
event_dates = datefinder.find_dates('afilename01082019.xls')
for match in event_dates:
    print(match)

this previous code prints:

2019-01-08 00:00:00

If your format is always CompanyNameEventLocationDDMMYYYY.xlsx, you should parse day, month and year number and give them to a date object:

import datetime
import glob
#get all Excel files within folder
path = r"C:\Users\me\Documents\Events\Spreadsheets\Consolidated\*.xlsx"
filenames = glob.glob(path)

for filename in filenames:
    full_date = filename.split('.')[0]
    day, month, year = full_date[-8:-6], full_date[-6:-4], full_date[-4:]
    real_date = datetime.date(int(year), int(month), int(day))
    print(real_date)
ndclt
  • 2,590
  • 2
  • 12
  • 26
0

Ended up getting it working by making these changes:

import pandas as pd
import glob
import datefinder
from datetime import datetime
#get all Excel files within folder
path = r"C:\Users\me\Documents\Events\Spreadsheets\Consolidated\*.xlsx"
filename = glob.glob(path)
#get dates from filenames and convert to datetime objects
event_dates = (datefinder.find_dates(f) for f in filename)
event_dates_dto = []
for dates in event_dates:
    event_dates_dto.append(pd.to_datetime(list(dates)))
#check output
print(event_dates_dto)

This now creates a list of DateTimeIndexes extracted from each filename which can be used as needed for the forward date column I need to construct, for indexing, etc.

Naeblis
  • 54
  • 8