27

I have a log file which contains timestamp column. The timestamp is in unix epoch time format.

I want to create a partition based on a timestamp with partitions year, month and day.

So far I have done this but it is throwing an error.

PARSE ERROR cannot recognize input '(' in column type

Here is my code.

from (
      from raw_data
            MAP  ${PREFIX}raw_data.line
            USING 's3://scripts/clean.py'
            AS (timestamp STRING, name STRING)
      ) map_out
INSERT OVERWRITE TABLE date_base_data_temp PARTITION(year(timestamp), month(timestamp)), day(timestamp))) 
    select map_out.name;
Chris Gerken
  • 16,221
  • 6
  • 44
  • 59
priyank
  • 4,634
  • 11
  • 45
  • 52

5 Answers5

59

Oof, that looks ugly. Try using this function in Hive:

SELECT from_unixtime(unix_timestamp) as new_timestamp from raw_data ...

Or if timestamp is in ms instead of seconds:

SELECT from_unixtime(unix_timestamp DIV 1000) as new_timestamp from raw_data ...

That converts a unix timestamp into a YYYY-MM-DD HH:MM:SS format, then you can use the following functions to get the year, month, and day:

SELECT year(new_timestamp) as year, month(new_timestamp) as month, day(new_timestamp) as day ...
chhantyal
  • 11,874
  • 7
  • 51
  • 77
Aswin
  • 658
  • 6
  • 2
12

With more recent releases of Hive and SparkSQL, data type of date and type casting options are available. Following should work in Hive as well as Spark SQL

SELECT cast(from_unixtime(epoch_datetime) as date) from myHiveTable
inder
  • 131
  • 1
  • 4
10

If you need to convert the date in custom format, use this:

select date_format(from_unixtime(epoch_datetime),'yyyyMM') as formatted_date from myHiveTable;


which will return the date as yearMonth e.g. 201708

Abdul Mannan
  • 1,072
  • 12
  • 19
4

Adding this query to the list where the timestamp needs to be converted to date string yyyy-MM-dd for a string partition:

hive> select date_format(from_unixtime(epoch_datetime), 'yyyy-MM-dd') as day from table_name limit 20;

-- If required, remove the millis precision for timestamps
hive> select date_format(from_unixtime(cast(epoch_datetime/1000 as bigint)), 'yyyy-MM-dd') as day from table_name limit 20;
Yash Sharma
  • 1,674
  • 2
  • 16
  • 23
  • 2
    `date_format` is not necessary, `from_unixtime` can receive another format argument: `select from_unixtime(epoch_datetime, 'yyyy-MM-dd') as day from table_name limit 20;` https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF – Julian Qian Sep 05 '18 at 15:44
  • @JulianQian You are correct. I used your more succinct version, and it works fine. – NYCeyes Feb 02 '19 at 03:14
-1
select order_id, date_format(from_unixtime(order_date/1000),'yyy-MM-dd') as order_date ,order_customer_id,order_status
from orders

or if you see any error on the same , try to use select order_id, date_format(from_unixtime(order_date DIV 1000),'yyy-MM-dd') as order_date ,order_customer_id,order_status from orders