1

I am pulling a chunk of data within a range of time. It is pulling date and times from column recvd_dttm. It takes all the data starting from a year ago. I want to modify it so that it can pull a month or a day, but pd.DateOffset(months=1) is giving a KeyError:1 error. I get the same error if I change it to days=7. But it works just fine with years=1. What is going on here?

df = pd.read_csv('MYDATA.csv')

# filter by countries with at least one medal and sort
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])

#Only retrieve data before now (ignore typos that are future dates)
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]
# get first and last datetime for final week of data

range_max = df['recvd_dttm'].max()
range_min = range_max - pd.DateOffset(years=1)

# take slice with final week of data
df = df[(df['recvd_dttm'] >= range_min) & 
               (df['recvd_dttm'] <= range_max)]

EDIT: The problem was coming from elsewhere in the code!

jenryb
  • 2,017
  • 12
  • 35
  • 72
  • can you clarify what your end purpose is? is it to get the max values for every year? – AZhao Jul 14 '15 at 16:32
  • No. The end purpose is to have a slice of data from a certain range that a user inputs. They pick a day, a month, a year, and it pulls that amount of data from the csv. – jenryb Jul 14 '15 at 16:35
  • what about the range_max and range_min lines? what are you hoping those are trying accomplish? are those just the start and stops of the chosen range? – AZhao Jul 14 '15 at 16:43

4 Answers4

1

Have you tried being more explicit with what pd.DateOffset is acting on?

For example:

range_max = df['recvd_dttm'].max()
range_min = range_max - (df['recvd_dttm']+pd.DateOffset(years=1))

Then substitute month and days values.

AZhao
  • 13,617
  • 7
  • 31
  • 54
  • The thing is that pd.DateOffset(years=1) is working just fine. Substituting months and days is causing errors. Using your suggestion I got the error TypeError: cannot use a non-absolute DateOffset in datetime/timedelta operations [] Is my syntax wrong? – jenryb Jul 14 '15 at 16:21
  • hm can you provide a single row of data for me to play with? – AZhao Jul 14 '15 at 16:26
  • I have an example column for you. Hope that helps! – jenryb Jul 14 '15 at 16:36
  • for your example column i'm able to use your original code without a problem. the days and months are working. could it be a data issue? – AZhao Jul 14 '15 at 16:45
  • Wow, yes, it is a data issue. Thanks for helping me realize that. The original code later goes through by months that year, but obviously if you take a month of code, no January is available. I have to work on that now, thank you! – jenryb Jul 14 '15 at 16:51
  • If you're interested, the question for fixing this is here: http://stackoverflow.com/questions/31412541/modifying-code-to-work-for-month-and-week-instead-of-year – jenryb Jul 14 '15 at 16:52
1

You can use the offset family from pd.tseries.offsets. Below is the sample code.

import pandas as pd
import datetime

# your data
# ================================
df = pd.read_csv('/home/Jian/Downloads/MOCK_DATA.csv', usecols=[1, 4])

df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'])
mask = df['recvd_dttm'] <= datetime.datetime.now()
df = df.loc[mask]


# flexible offsets
# =======================================
print(range_max)

2015-07-14 16:52:58

# for 1 month: currently there is a bug
# range_min_month = range_max - pd.tseries.offsets.MonthOffset(1)

# for 1 week
range_min_week = range_max - pd.tseries.offsets.Week(1)
print(range_min_week)

2015-07-07 16:52:58

# for 5 days
range_min_day = range_max - pd.tseries.offsets.Day(5)
print(range_min_day)

2015-07-09 16:52:58
Jianxun Li
  • 24,004
  • 10
  • 58
  • 76
  • Thank you Jianxun! I actually realized that this question was pretty stupid, as the issue was elsewhere in the code. – jenryb Jul 15 '15 at 16:50
0

Try using timedelta instead of DateOffset

ah bon
  • 9,293
  • 12
  • 65
  • 148
  • I should have written this as something I've tried. I don't want to do timedelta(days=30) or anything for a month because it is inaccurate for some months. If I try months=1 I get TypeError: 'months' is an invalid keyword argument for this function. Same for years=1 – jenryb Jul 14 '15 at 16:01
0

Have you consider using Unix Epoch Time instead of a date formatted in a lesser manner? There is a well documented answer for converting to Unix Time, and dealing with the sort of offset in the question seems like it would be a lot easier as sliding ranges are simpler to implement with a more or less continuous sequence of real numeric values.

Community
  • 1
  • 1
Matt
  • 545
  • 3
  • 16