Table to table insert w/o duplicates in hive

Question

I have table A as truncate and load for every month file and table B will be append

So table A will be file to table in hive Table B will be tableA Insert and append data

Issue here is table B is straight move select stmt from table A , and chances are it can be inserted with duplicate/ same data

How should I write a select query to insert data from Table A Both tables will have file-date as the column Left join A and B is giving wrong counts in this insert tables

And hive is not working for not exists code

Issue Is:

Append table script : partitioned by yearmonth

Insert into table dist.t2 Select Person_sk, Np_id, Yearmonth, Insert_date File_date From table raw.ma

Data in Table raw.ma —this is truncate and reload File1 data:201902 File2data:201903 File3data:201904 File4data: if 201902 data gets loaded to table — this should not duplicate the file1 data.. it should either not get inserted or should overwrite that partition

Here I need a filter or where condition to append data into dist.t2

Can you please help with this ??

I tried alter drop table partition in hive, but it’s failing in the spark framework

Please help with avoiding duplicate entries insert

Make it easy to assist you: [mcve]. – jarlh Dec 05 '19 at 13:22 — jarlh, Dec 05 '19 at 13:22
https://stackoverflow.com/q/37709411/2700344 – leftjoin Dec 05 '19 at 14:06 — leftjoin, Dec 05 '19 at 14:06

Table to table insert w/o duplicates in hive

0 Answers0