I have table A as truncate and load for every month file and table B will be append
So table A will be file to table in hive Table B will be tableA Insert and append data
Issue here is table B is straight move select stmt from table A , and chances are it can be inserted with duplicate/ same data
How should I write a select query to insert data from Table A Both tables will have file-date as the column Left join A and B is giving wrong counts in this insert tables
And hive is not working for not exists code
Issue Is:
Append table script : partitioned by yearmonth
Insert into table dist.t2 Select Person_sk, Np_id, Yearmonth, Insert_date File_date From table raw.ma
Data in Table raw.ma —this is truncate and reload File1 data:201902 File2data:201903 File3data:201904 File4data: if 201902 data gets loaded to table — this should not duplicate the file1 data.. it should either not get inserted or should overwrite that partition
Here I need a filter or where condition to append data into dist.t2
Can you please help with this ??
I tried alter drop table partition in hive, but it’s failing in the spark framework
Please help with avoiding duplicate entries insert