4

I am trying to create a new table with the existing table using Hive query.
While creating a new table I want to add a new column and insert current timestamp for all the rows. For example:

Exiting table:

|user_id|user_name|user_address|
|1001   |userName |address     |


New table:

|user_id|user_name|user_address|creation_date|
|1001   |userName |address     | 123421342134|

I am trying to execute this query using JDBCTemplate.

What I have tried:

Create table newTable Select * from existingtable;

But that would copy create new table with old values, I want to add new column and insert values during the table creation.



Please help.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
mayank bisht
  • 618
  • 3
  • 14
  • 43

3 Answers3

5

You can add new column like this:

create table newTable AS
select s.*, unix_timestamp(current_timestamp) as creation_date 
  from existingtable s;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thanks for the answer, getting an exception: failed: ParseException, cannot recognize input near 'select' 'unix_timestamp'. – mayank bisht May 21 '20 at 08:41
  • @mayankbisht fixed! it was extra `select` – leftjoin May 21 '20 at 09:03
  • Yes, now the query is working fine, but I could see all the same value inside creation_Date. I am looking for an identity kind of value.I have to sort using that column. – mayank bisht May 21 '20 at 09:09
  • @mayankbisht Identity is another complex subject. unix_timestamp() without arguments will give you different timestamps but they are not unique, it will be executed in different containers in parallel and will generate sometimes the same. If you want absolutely unique identifier - then use UUID https://stackoverflow.com/a/42314282/2700344 - it works good in distributed environment. – leftjoin May 21 '20 at 09:33
  • @mayankbisht If you want it sortable - comparable like bigint ID or timestamp then it is not easy to generate such id for big dataset. For example you can use `row_number over()` to generate unique ID, but it will run SLOW in SINGLE reducer container – leftjoin May 21 '20 at 09:33
  • @mayankbisht read this: https://stackoverflow.com/a/55105514/2700344. If you on Hive 3.0 you can use surrogate_key function. – leftjoin May 21 '20 at 09:38
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/214327/discussion-between-mayank-bisht-and-leftjoin). – mayank bisht May 21 '20 at 10:17
0

You can run a hive one-shot command for just one hive execution.

This utility is really helpful when you need to run more than just one query execution or to run bigger queries that take a long time execution.

  • create a myquery.hql file with all the queries delimited by ; and then run from hdfs nohup hive -f /myquery.hql

You can keep looking at the console by accessing the nohup.out file:

tail -f nohup.out

Kenry Sanchez
  • 1,703
  • 2
  • 18
  • 24
0

If you need a new table defined with exactly the same structure as an existing table, then Hive make it very easy to create the new table. This is called cloning a table, and it’s done using the LIKE clause. The new table will have the same column definitions and other properties as the existing table, but no data. The syntax is

CREATE TABLE new_table_name LIKE existing_table_name;

CREATE TABLE jobs_archived LIKE jobs;

It is possible to specify a few of the table properties for the new table by including the appropriate clauses in the CREATE TABLE … LIKE statement. For example LOCATION and STORED AS clauses can be used. If you need to change other properties, use ALTER TABLE after creating the table to set those properties.

Afterwards, if you need add more columns you can use ALTER TABLE statement.

You can add one or more columns to the end of the column list using ADD COLUMNS, The general syntax is

ALTER TABLE tablename ADD COLUMNS (col1 TYPE1,col2 TYPE2,… );

ALTER TABLE employees ADD COLUMNS (bonus INT);

You can wrap this sentences in your JDBC statements.

I hope this helps.

Chema
  • 2,748
  • 2
  • 13
  • 24