Questions tagged [hive-partitions]

To be used for questions regarding partitions in hive.

Partitioning is a way of dividing a table into related parts based on the values of partitioned columns such as date, city, and department. Using partition, it is easy to query a portion of the data.

Partitions are essentially horizontal slices of data which allow larger sets of data to be separated into more manageable chunks. In Hive, partitioning is supported for both managed and external tables in the table definition as seen below.

144 questions
15
votes
2 answers

External Hive Table Refresh table vs MSCK Repair

I have external hive table stored as Parquet, partitioned on a column say as_of_dt and data gets inserted via spark streaming. Now Every day new partition get added. I am doing msck repair table so that the hive metastore gets the newly added…
Ajith Kannan
  • 812
  • 1
  • 8
  • 30
15
votes
5 answers

How do I drop all partitions at once in hive?

Hive version 1.1 I have a hive external table as below: CREATE EXTERNAL TABLE `schedule_events`( `schedule_id` string COMMENT 'from deserializer', `service_key` string COMMENT 'from deserializer', `event_start_date_time` string COMMENT 'from…
Surender Raja
  • 3,553
  • 8
  • 44
  • 80
8
votes
4 answers

How to truncate a partitioned external table in hive?

I'm planning to truncate the hive external table which has one partition. So, I have used the following command to truncate the table : hive> truncate table abc; But, it is throwing me an error stating : Cannot truncate non-managed table abc. Can…
fervent
  • 123
  • 1
  • 2
  • 10
8
votes
4 answers

how to add columns to existing hive partitioned table?

alter table abc add columns (stats1 map, stats2 map) i have altered my table with above query. But after while checking the data i got NULL's for the both extra columns. I'm not getting data. screenshot
Veeru Chow
  • 81
  • 1
  • 1
  • 3
7
votes
1 answer

Insert into static hive partition using Presto

Suppose I want to INSERT INTO a static hive partition, can I do that with Presto? The PARTITION keyword is only for hive. INSERT INTO TABLE Employee PARTITION (department='HR') Caused by: com.facebook.presto.sql.parser.ParsingException: line…
Tiberiu
  • 990
  • 2
  • 18
  • 36
7
votes
2 answers

pyspark - getting Latest partition from Hive partitioned column logic

I am new to pySpark. I am trying get the latest partition (date partition) of a hive table using PySpark-dataframes and done like below. But I am sure there is a better way to do it using dataframe functions (not by writing SQL). Could you…
vinu.m.19
  • 495
  • 2
  • 8
  • 16
7
votes
1 answer

Spark Structured Streaming Writestream to Hive ORC Partioned External Table

I am trying to use Spark Structured Streaming - writeStream API to write to an External Partitioned Hive table. CREATE EXTERNAL TABLE `XX`( `a` string, `b` string, `b` string, `happened` timestamp, `processed` timestamp, `d` string, `e` string, `f`…
6
votes
2 answers

Does DROP PARTITION delete data from external table in HIVE?

An external table in HIVE is partitioned on year, month and day. So does the following query delete data from external table for the specific partitioned referenced in this query?:- ALTER TABLE MyTable DROP IF EXISTS…
Dhiraj
  • 3,396
  • 4
  • 41
  • 80
5
votes
1 answer

What are the allowed data types of partition column in hive?

I am pretty sure that complex types like STRUCT can not be the type of a partition column. But I am not sure if all the primitive types are valid or not. I have read a lot of documentation but didn't find anything.
Wang Zhong
  • 125
  • 2
  • 9
5
votes
1 answer

How does hive handle insert into internal partition table?

I have a requirement to insert streaming of records into Hive partitioned table. The table structure is something like CREATE TABLE store_transation ( item_name string, item_count int, bill_number int, ) PARTITIONED BY ( yyyy_mm_dd…
Nageswaran
  • 7,481
  • 14
  • 55
  • 74
5
votes
1 answer

Dynamic partition cannot be the parent of a static partition '3'

While inserting data into table hive threw the error "Dynamic partition cannot be the parent of a static partition '3'" using below query INSERT INTO TABLE student_partition PARTITION(course , year = 3) SELECT name, id, course FROM student1 WHERE…
ram
  • 183
  • 2
  • 6
5
votes
1 answer

Create Table in Hive with one file

I'm creating a new table in Hive using: CREATE TABLE new_table AS select * from old_table; My problem is that after the table is created, It generates multiple files for each partition - while I want only one file for each partition. How can I…
Bramat
  • 979
  • 4
  • 24
  • 40
4
votes
1 answer

How to read filtered partitioned parquet files efficiently using pandas's read_parquet?

Let say my data stored in object storage, say s3, with date time partition like this: s3://my-bucket/year=2021/month=01/day=03/SOME-HASH-VAL1.parquet ... s3://my-bucket/year=2022/month=12/day=31/SOME-HASH-VAL1000.parquet According to pandas's…
user3595632
  • 5,380
  • 10
  • 55
  • 111
4
votes
2 answers

Understanding Hive table creation notation

I have come across Hive tables which I need to convert to Redshift/MySql equivalent. I am having trouble understanding Hive query structure and would appreciate some help: CREATE TABLE IF NOT EXISTS table_1 ( id BIGINT, price DOUBLE, …
madu
  • 5,232
  • 14
  • 56
  • 96
4
votes
1 answer

partitions in hive interview questions

1) If the partitioned column doesn't have data, so when you query on that, what error will you get? 2)If some rows doesn't have the partitioned column , the how those rows will be handled? will there be any data loss? 3)Why bucketing needs to be…
Anonymous
  • 193
  • 1
  • 13
1
2 3
9 10