10

The problem is in line 22 :

if start_date <= data_entries.iloc[j, 1] <= end_date:

where I want to compare the start_date and end_date portion to data_entries.iloc[j, 1] which is accessing a column of the pandas dataframe. I converted the column to datetime using,

data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y")

But I am unsure how to convert it to date.

import pandas as pd
import datetime

entries_csv = "C:\\Users\\Pops\\Desktop\\Entries.csv"

data_entries = pd.read_csv(entries_csv)
data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y")

start_date = datetime.date(2018, 4, 1)
end_date = datetime.date(2018, 10, 30)

    for j in range(0, len(data_entries)):
        if start_date <= data_entries.iloc[j, 1] <= end_date:
             print('Hello')
moshevi
  • 4,999
  • 5
  • 33
  • 50
Pherdindy
  • 1,168
  • 7
  • 23
  • 52
  • please provide [mcve](https://stackoverflow.com/help/mcve). – moshevi Jul 23 '18 at 08:27
  • Check this. https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior – pratik mankar Jul 23 '18 at 08:32
  • @pratikmankar I was referring to this earlier actually. My problem is the `pd.to_datetime()` of pandas. I believe there is no `pd.to_date()` equivalent. The link below only converts to datetime, but based on my code, it does not work with `datetime.date()` when you try to compare them. https://stackoverflow.com/questions/16852911/how-do-i-convert-dates-in-a-pandas-data-frame-to-a-date-data-type – Pherdindy Jul 23 '18 at 08:41
  • 1
    you could `data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y").dt.date` – moshevi Jul 23 '18 at 08:44
  • @MarcSantos - It is pandas version related problem, you can check [this](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0231-fixed-regressions). – jezrael Jul 23 '18 at 08:48
  • @moshevi it works thanks – Pherdindy Jul 23 '18 at 08:50
  • @jezrael will look into it thanks. – Pherdindy Jul 23 '18 at 08:50

2 Answers2

10

Just use pd.Timestamp objects without any conversion:

start_date = pd.Timestamp('2018-04-01')
end_date = pd.Timestamp('2018-10-30')

res = data_entries[data_entries['VOUCHER DATE'].between(start_date, end_date)]

Explanation

Don't use datetime.datetime or datetime.date objects in Pandas series. This is inefficient because you lose vectorised functionality. The benefit of pd.Timestamp objects is you can utilize vectorised functionality for calculations. As described here:

numpy.datetime64 is essentially a thin wrapper an int64. It has almost no date/time specific functionality.

pd.Timestamp is a wrapper around a numpy.datetime64. It is backed by the same int64 value, but supports the entire datetime.datetime interface, along with useful pandas-specific functionality.

jpp
  • 159,742
  • 34
  • 281
  • 339
8

this converts it to date:

data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y").dt.date

however i would not recommend filtering like this. this is much faster

data_entries[data_entries['VOUCHER DATE'].between(start_date, end_date)]

read this article

moshevi
  • 4,999
  • 5
  • 33
  • 50