Questions tagged [hive-configuration]

To be used for questions related Hive configuration properties.

Official documentation: Hive configuration properties

44 questions
14
votes
2 answers

What is the hive command to see the value of hive.exec.dynamic.partition

We know that set command is used to set some values for properties hive> SET hive.exec.dynamic.partition=true; hive> SET hive.exec.dynamic.partition.mode=non-strict; But how do we read the current value of above property I tried the below…
Surender Raja
  • 3,553
  • 8
  • 44
  • 80
11
votes
2 answers

Hive create table with inputs from nested sub-directories

I have data in Avro format in HDFS in file paths like: /data/logs/[foldername]/[filename].avro. I want to create a Hive table over all these log files, i.e. all files of the form /data/logs/*/*. (They're all based on the same Avro schema.) I'm…
Maxim Zaslavsky
  • 17,787
  • 30
  • 107
  • 173
9
votes
1 answer

Hive Map-Join configuration mystery

Could someone clearly explain what is the difference between hive.auto.convert.join and hive.auto.convert.join.noconditionaltask configuration parameters? Also these corresponding size parameters: hive.mapjoin.smalltable.filesize and…
leftjoin
  • 36,950
  • 8
  • 57
  • 116
8
votes
1 answer

HIVE select count(*) non null returns higher value than select count(*)

I am currently doing some data exploration with Hive and cannot explain the following behavior. Say I have a table (named mytable) with a field master_id. When I count the number of row I get select count(*) as c from mytable c 1129563 If I want…
z_eb
  • 93
  • 1
  • 6
7
votes
1 answer

Using reserved words in Hive

I'm migrating data to Hive 1.2, and I realized that, by default, I'm no longer allowed to use reserved words as column names. If you want to use reserved words, you need to explicitly set the below setting:…
Nadine
  • 1,620
  • 2
  • 15
  • 27
5
votes
1 answer

Create Table in Hive with one file

I'm creating a new table in Hive using: CREATE TABLE new_table AS select * from old_table; My problem is that after the table is created, It generates multiple files for each partition - while I want only one file for each partition. How can I…
Bramat
  • 979
  • 4
  • 24
  • 40
4
votes
1 answer

When to set hive parameters during a session?

I'm new to my role and part of it requires creating/inserting data into both managed and external hive tables. We have a few lines of 'set' parameters that we run at the beginning of a hive session, but I've run into a few cases, where, for example,…
phenderbender
  • 625
  • 2
  • 8
  • 18
4
votes
1 answer

pyhive: Set hive properties using pyhive

i have a complex hive query which underlying joins are cartesian product. so i need to set the below properties. but when i execute these properties using pyhive it is not able to execute. i am getting an error asking to set properties for …
LUZO
  • 1,019
  • 4
  • 19
  • 42
3
votes
1 answer

Increase max row size in HIVE

I have a pyspark job with these configs: self.spark = SparkSession.builder.appName("example") \ .config("hive.exec.dynamic.partition", "true") \ .config("hive.exec.dynamic.partition.mode", "nonstrict") \ .config("hive.exec.max.dynamic.partitions",…
DrGenius
  • 817
  • 1
  • 9
  • 26
3
votes
2 answers

Avoid single file with hive.optimize.sort.dynamic.partition option

I'm using hive. When I write dynamic partitions with INSERT query and turn on hive.optimize.sort.dynamic.partition option(SET hive.optimize.sort.dynamic.partition=true), always there is single file in each partition. But if I turn of that option(SET…
Juhong Jung
  • 101
  • 1
  • 7
3
votes
2 answers

Hive - Can one extract common options for reuse in other scripts?

I have two Hive scripts which look like this: Script A: SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=non-strict; SET hive.exec.parallel=true; ... do something ... Script B: SET hive.exec.dynamic.partition=true; …
FirstName LastName
  • 1,891
  • 5
  • 23
  • 37
2
votes
0 answers

Is there a way to set Hive configurations using Hive magic in Jupyter notebook?

I am using Jupyter Notebook to crunch data in Hive and I want to set Hive configurations using Hive magic. Is there a way to do it? Sample code below does not work (Please treat them as one Jupyter Notebook cell each block). I can do this via HUE…
heinistic
  • 731
  • 2
  • 8
  • 16
2
votes
1 answer

Validate Hive Single and Multi Query Parallelism

I configured Hive parallelism with below hive-site.xml properties and restarted the cluster Property 1 Name: hive.exec.parallel Value: true Description: Run hive jobs in parallel Property 2 Name: hive.exec.parallel.thread.number Value: 8…
user1
  • 391
  • 3
  • 27
2
votes
1 answer

How to export hive query result to single local file?

I want to export hive query result to single local file with pipe delimiter. Hive query contains order by clause. I have tried below solutions. Solution1: hive -e 'insert overwrite local directory '/problem1/solution' fields terminated by '|' select…
Mahebub A Sayyed
  • 325
  • 5
  • 14
2
votes
1 answer

What will happen if Hive number of reducers is different to number of keys?

In Hive I ofter do queries like: select columnA, sum(columnB) from ... group by ... I read some mapreduce example and one reducer can only produce one key. It seems the number of reducers completely depends on number of keys in columnA. Therefore,…
user3692015
  • 391
  • 4
  • 15
1
2 3