0

i have a Hive table (A) with two columns, and both are sorted in an order but with duplicates. Need to insert only the unique values from table A into table B but the order should be preserved. Unable to do this with distinct as the sorting order gets changed.

mazaneicha
  • 8,794
  • 4
  • 33
  • 52

1 Answers1

0

If the question is about sorted bucketed table, read this answer: https://stackoverflow.com/a/41249147/2700344 You should add cluster by/distribute by + sort by when inserting the data.

If you are selecting such sorted table and expect the dataset returned is sorted without order by clause, read this answer: https://stackoverflow.com/a/47416027/2700344

I short: select without order by does not guarantee the order.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thanks @leftjoin, cluster by or sort by works to preserve the order while inserting in an hive table. Do you know if there is any similar way that we can achieve this in Impala as well as the same query that works in hive not works in Impala – Srini Jayaram Oct 02 '20 at 08:14
  • @SriniJayaram No, unfortunately, I do not know how to do it in Impala – leftjoin Oct 02 '20 at 08:24