What it this "partitioning" thing in Hive and what am I supposed to do here?

Question

I have a file with the following columns, for which I'm being asked to "partition based on the extract date". "Extract date" is a column in the file. Here are the columns in the file:

Extract date
name
location
Extract date

Now, I have containing this file in my Unix directory.

What exactly am I being asked to do here?

This link will help you to understand the concept - https://stackoverflow.com/questions/19128940/what-is-the-difference-between-partitioning-and-bucketing-a-table-in-hive — arunkvelu, Jan 11 '19 at 14:11

score 0 · Answer 1 · answered Jan 11 '19 at 11:08

Partitioning is a feature in Hive provided to target a set of records from your table.

First you create a partitioned table based on the "Extract Date" column, like below

create table <table_name> 
(
name string,
location string
)
partitioned by (extract_date string)
stored as TEXTFILE;

By doing this your partitioned table will be created.

Now in order to load the data from a file into your table there are again many ways to do so,

Loading using static partition mechanism
Loading using Dynamic partition by selecting the data from another table etc.

What it this "partitioning" thing in Hive and what am I supposed to do here?

1 Answers1