7

I have a date column in a pandas DF with Julian dates. How can I convert these Julian dates to mm-dd-yyyy format.

Sample data

     ORG   CHAIN_NBR  SEQ_NBR     INT_STATUS  BLOCK_CODE_1  DATA_BLOCK_CODE_1
0   523         1        0          A             C             2012183
1   523         2        1          I             A             2013025
2   521         3        1          A             H             2007067
3   513         4        1          D             H             2001046
4   513         5        1          8             I             2006075

I was using jd2gcal function but it's not working. I was also trying to write a code like this but of no use.

for i,row in amna.iterrows():
    amna['DATE_BLOCK_CODE_1'] = datetime.datetime.strptime(row['DATE_BLOCK_CODE_1'], '%Y%j')

desired Output:

    ORG   CHAIN_NBR  SEQ_NBR     INT_STATUS  BLOCK_CODE_1  DATA_BLOCK_CODE_1
0   523         1        0          A             C             mm-dd-yyyy
1   523         2        1          I             A             mm-dd-yyyy
2   521         3        1          A             H             mm-dd-yyyy
3   513         4        1          D             H             mm-dd-yyyy
4   513         5        1          8             I             mm-dd-yyyy

Please help me with this.

martineau
  • 119,623
  • 25
  • 170
  • 301
DPs
  • 207
  • 4
  • 10

4 Answers4

4

Let's try something like this

df['New Date'] = (pd.to_datetime((df.DATA_BLOCK_CODE_1 // 1000).astype(str)) + 
                 pd.to_timedelta(df.DATA_BLOCK_CODE_1 % 1000, unit='D'))

print(df)

   ORG  CHAIN_NBR  SEQ_NBR INT_STATUS BLOCK_CODE_1  DATA_BLOCK_CODE_1   New Date
0  523          1        0          A            C            2012183 2012-07-02
1  523          2        1          I            A            2013025 2013-01-26
2  521          3        1          A            H            2007067 2007-03-09
3  513          4        1          D            H            2001046 2001-02-16
4  513          5        1          8            I            2006075 2006-03-17
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
3
julian = df.DATA_BLOCK_CODE_1.str[4:].str.extract("([1-9][0-9]?[0-9]?)")    
df["DATA_BLOCK_CODE_1"] = df.DATA_BLOCK_CODE_1.str[:4] + "-" + julian
df['DATA_BLOCK_CODE_1'] = pd.to_datetime(df['DATA_BLOCK_CODE_1'], format='%Y-%j')

enter image description here

The regular expression would require the digits to start any digit but 0. (I assume DATA_BLOCK_CODE_1 is of string type.)

Tai
  • 7,684
  • 3
  • 29
  • 49
1

The Pandas to_datetime function has an origin parameter for julian to datetime conversion:

amna['Date'] = pd.to_datetime(amna['DATA_BLOCK_CODE_1'], unit='D', origin='julian')

print(df)

   ORG  CHAIN_NBR  SEQ_NBR INT_STATUS BLOCK_CODE_1  DATA_BLOCK_CODE_1   Date
0  523          1        0          A            C            2012183 2012-07-02
1  523          2        1          I            A            2013025 2013-01-26
2  521          3        1          A            H            2007067 2007-03-09
3  513          4        1          D            H            2001046 2001-02-16
4  513          5        1          8            I            2006075 2006-03-17

Note that you must include the unit parameter, and it must be set to days ('D').

Edit 2020-06-27

In response to Sami Navesi's comment, if the DATA_BLOCK_CODE_1 column is integer or string, you can split the year out (convert to string) and convert the julian days separately as shown in this answer. This example assumes the DATA_BLOCK_CODE_1 is string. The results are the same as above.

amna['Date'] = [pd.to_datetime(e[:4]) + pd.to_timedelta(int(e[4:]) - 1, unit='D') for e in amna['DATA_BLOCK_CODE_1']]

deekay
  • 41
  • 1
  • 4
0
df.createOrReplaceTempView("df")

new_df =  sprak.sql("select *, from_unixtime(unix_timestamp(cast(DATA_BLOCK_CODE_1as string),'yyyyDDD'),'yyyy-MM-dd') from df")

new_df.show()
Rahmat Waisi
  • 1,293
  • 1
  • 15
  • 36
Ankita
  • 1