0

I ran a basic select statement in hive expecting 20211227 but hive return 20221227.

select date_format('2021-12-27','YYYYMMdd')

I ran the statement below and it gives me expected result 20211225. Not sure, why hive behaves this way.

select date_format('2021-12-25','YYYYMMdd')

Chris Albert
  • 2,462
  • 8
  • 27
  • 31
  • try this one select date_format('2021-12-27','dd-MM-YYYY'); if wont work then maybe check timezone settings. – Yusuf Jan 12 '22 at 00:50
  • Correct format should be `yyyyMMdd`. Hive follows java date format. Now why first sql isnt working, may be some issues with auto string to date conversion. – Koushik Roy Jan 12 '22 at 03:57
  • See this answer: https://stackoverflow.com/a/69840917/2700344 – leftjoin Jan 12 '22 at 07:13

1 Answers1

1

hive uses Java simpledateformat. As per Java guide,

  • y (lowercase) is year
  • Y (uppercase) is 'week-based-year'

This difference will cause your code to work perfectly fine, except for when dealing with dates at the very end of some years. So, when you use ('2021-12-27','YYYYMMdd'), yyyy will output 2021 but YYYY will output 2022. Because the week that the 27th of December falls in the first week of 2022.
date_format('2021-12-25','YYYYMMdd') is working because 25th Dec is not the first week of 2022.
Refer to below screenshot, you can see lowercase y is giving you correct result and uppercase is picking up year from first week of year which is 2022.

example

Please always use yyyy as per hive docs says.

Koushik Roy
  • 6,868
  • 2
  • 12
  • 33