1

I have a date column (object) in my df that looks like the format below.

I need to convert this column or create a new date column that contains a 5 digit Julian Date in the format of YYDDD. Not sure how to do this in python/pandas

Date:

2016-03-28          
2016-03-11          
2016-03-12          
2016-03-23          
2016-03-04          
2016-03-02          
2016-03-30         
2016-03-30          
2016-03-13          
2016-03-13

Essentially I want to create a new column in my df based off of the initial df['date'] column where df['date2'] is in the desired format.

so when date1 = "2016-03-28" date2 = 16088

nakedbird226
  • 77
  • 2
  • 7
  • 1
    Here's a [question](https://stackoverflow.com/q/36156582/945456) where they add a new column based on another date/time value. It might help you get started. You might be able to use [`.dt.dayofyear`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.dayofyear.html#pandas.Series.dt.dayofyear). A little code to show us exact where you're stuck (e.g. creating new column vs processing the date) would have been helpful. – Jeff B Jul 14 '17 at 13:50
  • 3
    Your idea of [Julian Date](https://en.wikipedia.org/wiki/Julian_day) is highly unconventional, – gboffi Jul 14 '17 at 13:51
  • 2
    Possible duplicate of [Extract day of year and Julian day from a string date in python](https://stackoverflow.com/questions/13943062/extract-day-of-year-and-julian-day-from-a-string-date-in-python) – Błotosmętek Jul 14 '17 at 13:54
  • 2
    What you need has a different name, unfortunately. – gboffi Jul 14 '17 at 14:27

2 Answers2

5

You can use strftime, check http://strftime.org/:

df = pd.DataFrame({'date': pd.date_range('2016-03-28', periods=5)})
print (df)
        date
0 2016-03-28
1 2016-03-29
2 2016-03-30
3 2016-03-31
4 2016-04-01

df['newFormat'] = df['date'].dt.strftime('%y%j')
print (df)
        date newFormat
0 2016-03-28     16088
1 2016-03-29     16089
2 2016-03-30     16090
3 2016-03-31     16091
4 2016-04-01     16092
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

I'm no python/pandas expert, but the following may work for you:

import pandas as pd
df = pd.DataFrame({'date': pd.date_range('2016-03-28', periods=5)})
df['paddedDay'] = df['date'].dt.dayofyear.apply(lambda x: str(x).zfill(3))
df['lastTwoOfYear'] = df['date'].dt.year.apply(lambda x: str(x)[-2:])
df['newFormat'] = df['lastTwoOfYear'].map(str) + df['paddedDay']

This produces the following.

        date paddedDay lastTwoOfYear newFormat
0 2016-03-28       088            16     16088
1 2016-03-29       089            16     16089
2 2016-03-30       090            16     16090
3 2016-03-31       091            16     16091
4 2016-04-01       092            16     16092

I'd guarantee you there's a better way of doing this (surely there's a way to avoid the temporary columns). This is my first time to ever write anything for pandas, so please excuse the mess. If anyone is tempted to downvote, I'd challenge them to show us a better solution; I'd love to see the correct way to do this :)

Jeff B
  • 8,572
  • 17
  • 61
  • 140
  • Thanks for the help. This is useful to see how you broke things out and put the pieces together to form the desired format. – nakedbird226 Jul 14 '17 at 15:28
  • @nakedbird226 You're welcome. That being said, jezrael's answer using `strftime` is a much better answer and should be accepted ;) – Jeff B Jul 14 '17 at 15:29