1

I've extracted the dates from filenames in a set of Excel files into a list of DateTimeIndex objects. I now need to write the extracted date from each to a new date column for the dataframes I've created from each Excel sheet. My code works in that it writes the the new 'Date' column to each dataframe, but I'm unable to convert the objects out of their generator object DateTimeIndex format and into a %Y-%m-%d format.

Link to code creating the list of DateTimeIndexes from the filenames: How do I turn datefinder output into a list?

Code to write each list entry to a new 'Date' column in each dataframe created from the spreadsheets:

for i in range(0, len(df)):
    df[i]['Date'] = (event_dates_dto[i] for frames in df)

The involved objects:

type(event_dates_dto)
<class 'list'>

type(event_dates_dto[0])
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>

event_dates_dto
[DatetimeIndex(['2019-03-29'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-04-13'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-05-11'], dtype='datetime64[ns]', freq=None)]

The dates were extracted using datefinder: http://www.blog.pythonlibrary.org/2016/02/04/python-the-datefinder-package/

I've tried using methods here that seemed like they could make sense but none of them are the right ticket: Keep only date part when using pandas.to_datetime

Again, the simple for function is working correctly, but I'm unsure how to coerce the generator object into the correct format so that it not only writes to the new 'Date' column but also so that it is is in a useful '%Y-%m-%d' format that makes sense within the dataframe. Any help is greatly appreciated.

Naeblis
  • 54
  • 8

2 Answers2

2
  1. force evaluation with a one line loop like dates = [_ for _ in matches]
  2. convert the index to a column using the .index (or .reset_index() if you don't need to keep it)
  3. convert the column to datetime using pd.to_datetime() . use the .dt.date object of the datetime column to convert to Y-m-d

Here's a sample

import datefinder
import pandas as pd

data = '''Your appointment is on July 14th, 2016 15:24. Your bill is due 05/05/2016 16:00'''
matches = datefinder.find_dates(data)
# force evaluation with 1 line loop
dates = [_ for _ in matches] # 'dates = list(matches)' also works
df = pd.DataFrame({'dt_index':dates,'value':['appointment','bill']}).set_index('dt_index')
df['date'] = df.index
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.date
df

which gives

value   date
dt_index        
2016-07-14 15:24:00 appointment 2016-07-14
2016-05-05 16:00:00 bill    2016-05-05

Edit: Edited to account for forced evaluation

rhedak
  • 399
  • 4
  • 13
  • Your suggestion seems to be working in terms of process, but the DateTimeIndexes aren't being pulled correctly and all end up as 1970-01-01 as the .index step is just setting them back to 0-len(df). You can see in the code above what my DateTimeIndexes start out as in the last line. The problem is that my list containing the DateTimeIndexes is still a list of generators and they need to be converted. I've searched the specific error and what I can find on datefinder but gotten no help. I'll link the code that creates them in an edit above. – Naeblis Sep 05 '19 at 21:46
  • You need to force evaluation of the generator object, e.g. with a one line loop like dates = [ _ for _ in matches ] I edited the answer to account for that – rhedak Sep 06 '19 at 02:06
  • I got the extraction to strings that I needed working. I'm not going to mess with it anymore until tomorrow because it's baked my noodle really thoroughly already, but I think I'll end up using part of the code order you posted. I'll have to see what makes the most sense once I get the strings moved into the 'Date' column, but I think it should be a straightforward set_index from there, and then I can concatenate the separate frames and make a new CSV from them. Thanks for the input, I'll update if there are more issues. – Naeblis Sep 06 '19 at 02:14
  • Got it fixed. Careless mistake of trying to reference the iterator within an unnecessary list comprehension. Just needed to use .append and i as a locator. – Naeblis Sep 07 '19 at 23:36
  • if it's working then that is fine I guess. Are you going to accept the answer? – rhedak Sep 09 '19 at 01:44
1

A minor fix got it working, I was just trying to carry out too much at once and overthinking it.

#create empty list and append each date 
event_dates_transfer = []
#use .strftime('%Y-%m-%d') method on event_dates_dto here if you wish to return a string instead of a datetimeindex
for i in range(0,len(event_dates_dto)):
    event_dates_transfer.append(event_dates_dto[i][0])

#Create a 'Date' column for each dataframe correlating to the filename it was created from and set it as the index
for i in range(0, len(df)):
    new_date = event_dates_transfer[i]
    df[i]['Date'] = new_date
    df[i].set_index('Date', inplace=True)
Naeblis
  • 54
  • 8