2

I have two tables A and B. Both has same column names. I want to merge these two table and load it to table C. Table C also have same column names as A and B and one more column in timestamp (this for capturing the merged time). I dont want duplicates in Table C. I tried union but getting duplicate values because one of the column in Table C in Timestamp data type.

For Example, below is my sample query

insert overwrite table TableC 
select field1,field2, unix_timestamp() as field3 from table_A
UNION
select field1,field2, unix_timestamp() as field3 from table_B

The two unix_timestamp() function returns different timestamps (just a milli second difference) and I am getting duplicate data because of the timestamp.

Is there anyother way to get the same timestamp for both functions while union ?

  • You can insert NULL in the timestamp field and then execute: update TableC set field3=unix_timestamp() where field3 is null – Giacomo Degli Esposti Dec 20 '16 at 09:45
  • @GiacomoDegliEsposti Thanks for your suggestion . I myself found the workaround. i tried like below and it worked. `insert overwrite table TableC select T1.field1,T1.field2,unix_timestamp() as field3 from (select field1,field2 from table_A UNION select field1,field2 from table_B) AS T1` – Padmanabhan Vijendran Dec 20 '16 at 09:52

1 Answers1

3

unix_timestamp()
Gets current Unix timestamp in seconds.
This function is non-deterministic and prevents proper optimization of queries -
this has been deprecated since 2.0 in favour of CURRENT_TIMESTAMP

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

insert overwrite table TableC 
select field1,field2, unix_timestamp(current_timestamp) as field3 from table_A
UNION
select field1,field2, unix_timestamp(current_timestamp) as field3 from table_B

Additional work-arounds

insert overwrite table TableC 

select      field1,field2,unix_timestamp() as field3

from        (         select field1,field2 from table_A
            union all select field1,field2 from table_B
            ) t

group by    field1,field2

or

insert overwrite table TableC 

select      field1,field2,unix_timestamp() as field3

from        (     select field1,field2 from table_A
            union select field1,field2 from table_B
            ) t
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88