1

How does Impala support the concept of partitioning and, if it supports it, what are the differences between Hive Partitioning and Impala Partitioning?

Mario Becerra
  • 514
  • 1
  • 6
  • 16
user3279174
  • 99
  • 3
  • 11
  • 1
    http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/v1/latest/Installing-and-Using-Impala/ciiu_partitioning.html – Remus Rusanu Oct 21 '14 at 10:36

1 Answers1

1

By default, all the data files for a table are located in a single directory.

Partitioning is a technique for physically dividing the data during loading, based on values from one or more columns, to speed up queries that test those columns.

For example, with a school_records table partitioned on a year column, there is a separate data directory for each different year value, and all the data for that year is stored in a data file in that directory. A query that includes a WHERE condition such as YEAR=1966, YEAR IN (1989,1999), or YEAR BETWEEN 1984 AND 1989 can examine only the data files from the appropriate directory or directories, greatly reducing the amount of data to read and test.

Static and Dynamic Partitioning

Specifying all the partition columns in a SQL statement is called "static partitioning" ,because the statement affects a single predictable partition. For example, you use static partitioning with an ALTER TABLE statement that affects only one partition, or with an INSERT statement that inserts all values into the same partition:

insert into t1 partition(x=10, y='a') select c1 from some_other_table;

When you specify some partition key columns in an INSERT statement, but leave out the values, Impala determines which partition to insert This technique is called "dynamic partitioning":

insert into t1 partition(x, y='b') select c1, c2 from some_other_table;

Create new partition if necessary based on variable year, month, and day; insert a single value.

insert into weather partition (year, month, day) select 'cloudy',2014,4,21;

Create new partition if necessary for specified year and month but variable day; insert a single value.

insert into weather partition (year=2014, month=04, day) select 'sunny',22;

The more key columns you specify in the PARTITION clause, the fewer columns you need in the SELECT list. The trailing columns in the SELECT list are substituted in order for the partition key columns with no specified value.

You may refer to this link for further reading.

Hope that helps!

Rahul
  • 2,354
  • 3
  • 21
  • 30