I am trying to find the best way to implement the following pipeline in Hive and HDFS:
I would like to ingest a csv (no problem in there so far) but, I want to the partitions to be made from a field that comes informed into the csv files.
I created an external table that deposits the file on the HDFS and defined the partition field, but with a simple put (which makes sense) the partitions are not made and I get an exit code 1 when trying to drop the msck repair table.
I would like to know if the following is possible or viable to do:
Load the CSV file into an internal hive table and as a temporal table.
Do the insert into the "official" table with the partitions
Is this an efficient way to do it? If so, I havent found a lot of information about how to do the first.
Thanks.