1

I'm a Pandas newbie but decent at SQL. A function I often leverage in SQL is this:

YEAR(date_format_data) = (YEAR(GETDATE())-1) 

This will get me all the data from last year. Can someone please help me understand how to do the equivalent in Pandas?

Here's some example data:

Date     Number
01/01/15 1
01/02/15 2
01/01/15 3
01/01/16 2
01/01/16 1

And here's my best guess at the code:

df = df[YEAR('Date') == (YEAR(GETDATE()) -1)].agg(['sum'])

And this code would return a value of '3'.

Thank you in advance for your help, I'm having a really hard time figuring out what I'm sure is simple.

Me

Programming_Learner_DK
  • 1,509
  • 4
  • 23
  • 49
  • You may need to first convert your `Date` column to datetime objects with `df.Date = pd.to_datetime(df.Date)`. – Abdou Jun 26 '17 at 13:01
  • If you're using `pd.to_datetime` performance might be an issue if the date isn't in a standard format. [This](https://stackoverflow.com/questions/29882573/pandas-slow-date-conversion) might be useful then. – akilat90 Jun 26 '17 at 13:44

2 Answers2

2

I think you can do it this way:

prev_year = pd.datetime.today().year - 1
df.loc[df['Date'].dt.year == prev_year]

PS .dt.year accessor will work only if Date column is of datetime dtype. If it's not the case you may want to convert that column to datetime dtype first:

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0

For pandas, first convert your date column to timestamp by pd.to_datetime

df['Date2'] = pd.to_datetime(df['Date'])

(pd.to_datetime has a format parameter to specify your input date format) Then you have

df['Date2'].year
jf328
  • 6,841
  • 10
  • 58
  • 82