We know that set command is used to set some values for properties
hive> SET hive.exec.dynamic.partition=true;
hive> SET hive.exec.dynamic.partition.mode=non-strict;
But how do we read the current value of above property
I tried the below…
I have data in Avro format in HDFS in file paths like: /data/logs/[foldername]/[filename].avro. I want to create a Hive table over all these log files, i.e. all files of the form /data/logs/*/*. (They're all based on the same Avro schema.)
I'm…
Could someone clearly explain what is the difference between
hive.auto.convert.join
and
hive.auto.convert.join.noconditionaltask
configuration parameters?
Also these corresponding size parameters:
hive.mapjoin.smalltable.filesize
and…
I am currently doing some data exploration with Hive and cannot explain the following behavior. Say I have a table (named mytable) with a field master_id.
When I count the number of row I get
select count(*) as c from mytable
c
1129563
If I want…
I'm migrating data to Hive 1.2, and I realized that, by default, I'm no longer allowed to use reserved words as column names. If you want to use reserved words, you need to explicitly set the below setting:…
I'm creating a new table in Hive using:
CREATE TABLE new_table AS select * from old_table;
My problem is that after the table is created, It generates multiple files for each partition - while I want only one file for each partition.
How can I…
I'm new to my role and part of it requires creating/inserting data into both managed and external hive tables. We have a few lines of 'set' parameters that we run at the beginning of a hive session, but I've run into a few cases, where, for example,…
i have a complex hive query which underlying joins are cartesian product. so i need to set the below properties. but when i execute these properties using pyhive it is not able to execute. i am getting an error asking to set properties for …
I have a pyspark job with these configs:
self.spark = SparkSession.builder.appName("example") \
.config("hive.exec.dynamic.partition", "true") \
.config("hive.exec.dynamic.partition.mode", "nonstrict") \
.config("hive.exec.max.dynamic.partitions",…
I'm using hive.
When I write dynamic partitions with INSERT query and turn on hive.optimize.sort.dynamic.partition option(SET hive.optimize.sort.dynamic.partition=true), always there is single file in each partition.
But if I turn of that option(SET…
I have two Hive scripts which look like this:
Script A:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=non-strict;
SET hive.exec.parallel=true;
... do something ...
Script B:
SET hive.exec.dynamic.partition=true;
…
I am using Jupyter Notebook to crunch data in Hive and I want to set Hive configurations using Hive magic. Is there a way to do it?
Sample code below does not work (Please treat them as one Jupyter Notebook cell each block). I can do this via HUE…
I configured Hive parallelism with below hive-site.xml properties and restarted the cluster
Property 1
Name: hive.exec.parallel
Value: true
Description: Run hive jobs in parallel
Property 2
Name: hive.exec.parallel.thread.number
Value: 8…
I want to export hive query result to single local file with pipe delimiter.
Hive query contains order by clause.
I have tried below solutions.
Solution1:
hive -e 'insert overwrite local directory '/problem1/solution' fields terminated by '|' select…
In Hive I ofter do queries like:
select columnA, sum(columnB) from ... group by ...
I read some mapreduce example and one reducer can only produce one key. It seems the number of reducers completely depends on number of keys in columnA.
Therefore,…