4

I have a Hive Table with 2 columns.Employee ID and Salary.

Data is something like given below.

Employee ID Salary
1   10000.08
2   20078.67
3   20056.45
4   30000.76
5   10045.14
6   43567.76

I want to create Partitions based on Salary Column.For Example Partition for salary range 10000 to 20000, 20001 to 30000.

How do i achieve this.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Surbhi
  • 43
  • 1
  • 3

1 Answers1

6

Hive does not support range partitioning, but you can calculate ranges during data load.

  1. Create table partitioned by salary_range:

    create table your_table
    (
     employee_id bigint,
     salary double
    )
    partitioned by (salary_range bigint)
    
  2. insert using case for salary range calculation:

    insert overwrite table your_table partition (salary_range)   
    select employee_id, salary,  
           case 
               when salary between 10000 and 20000 then 20000
               when salary between 20001 and 30000 then 30000 
               ...
               else ...
           end as salary_range 
    from some_table;
    
Boxuan
  • 4,937
  • 6
  • 37
  • 73
leftjoin
  • 36,950
  • 8
  • 57
  • 116