Questions tagged [apache-hive]

Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems such as Amazon S3 filesystem. It provides an SQL-like language called HiveQL with schema on read and transparently converts queries to map/reduce, Apache Tez[7] and Spark jobs. All three execution engines can run in Hadoop YARN. To accelerate queries, it provides indexes, including bitmap indexes.

Few features:-

1.Indexing to provide acceleration, index type including compaction and Bitmap index as of 0.10, more index types are planned. 2.Different storage types such as plain text, RCFile, HBase, ORC, and others. 3.Metadata storage in an RDBMS, significantly reducing the time to perform semantic checks during query execution. 4.Operating on compressed data stored into the Hadoop ecosystem using algorithms including DEFLATE, BWT, snappy, etc. 5.Built-in user defined functions (UDFs) to manipulate dates, strings, and other data-mining tools. Hive supports extending the UDF set to handle use-cases not supported by built-in functions. 6.SQL-like queries (HiveQL), which are implicitly converted into MapReduce or Tez, or Spark jobs.

96 questions
13
votes
1 answer

Distinct on Multiple columns in Hive

Hi does Hive support distinct on multiple columns. like select distinct(a, b, c, d) from table. If not is there a way to achieve this?
Bhaskar Mishra
  • 3,332
  • 7
  • 26
  • 36
11
votes
2 answers

How do I access HBase table in Hive & vice-versa?

As a developer, I've created HBase table for our project by importing data from existing MySQL table using sqoop job. The problem is our data analyst team are familiar with MySQL syntax, implies they can query HIVE table easily. For them, I need to…
Abhishek
  • 6,912
  • 14
  • 59
  • 85
8
votes
3 answers

Select all columns of a Hive Struct

I have a requirement to select * from all columns from a hive struct. Hive create table script is here below Create Table script Select * from the table displays each struct as a column select * from table The requirement i have is to display all…
Abhijit Nayak
  • 101
  • 1
  • 1
  • 3
8
votes
2 answers

Insert timestamp into Hive

Hi i'm new to Hive and I want to insert the current timestamp into my table along with a row of data. Here is an example of my team table : team_id int fname string lname string time timestamp I have looked at some other examples, How to…
Frostie_the_snowman
  • 629
  • 3
  • 9
  • 17
8
votes
2 answers

Apache hive MSCK REPAIR TABLE new partition not added

I am new for Apache Hive. While working on external table partition, if I add new partition directly to HDFS, the new partition is not added after running MSCK REPAIR table. Below are the codes I tried, -- creating external table hive> create…
Green
  • 111
  • 1
  • 2
  • 7
7
votes
1 answer

Hive subquery in where clause (Select * from table 1 where dt > (Select max(dt) from table2) )..please suggest an alternative

I am looking for something in hive like Select * from table 1 where dt > (Select max(dt) from table2) Obviously hive doesn't support sub queries in where clause and also, even if I use joins or semi join, it compares only = and not > (As far as I…
user2957483
  • 71
  • 1
  • 1
  • 4
4
votes
1 answer

Get all Hive table/database creation/deletion details (audit logs)

Lets say I have a database - project . I created a table named tab1 and then later tab2 . Now I dropped the table tab1. Where do I look for the logs that says I have dropped the table tab1 from databse project. I would like to get the time , user…
K S Nidhin
  • 2,622
  • 2
  • 22
  • 44
4
votes
2 answers

Is it possible to add new column partition to already existing partitioned table in hive

I have partition table called employee_part.This table is partitioned by hiredate. It has metadata as given below When I tried to add new column partition to the employee_part table Im getting an error saying ALTER TABLE employee_part ADD…
marjun
  • 696
  • 5
  • 17
  • 30
3
votes
5 answers

Spark SQL on ORC files doesn't return correct Schema (Column names)

I have a directory containing ORC files. I am creating a DataFrame using the below code var data = sqlContext.sql("SELECT * FROM orc.`/directory/containing/orc/files`"); It returns data frame with this schema [_col0: int, _col1: bigint] Where as…
Ramu Malur
  • 129
  • 2
  • 10
3
votes
1 answer

Apache Hive - Single Insert Date Value

I'm trying to insert a date into a date column using Hive. So far, here's what i've tried INSERT INTO table1 (EmpNo, DOB) VALUES ('Clerk#0008000', cast(substring(from_unixtime(unix_timestamp(cast('2016-01-01' as string), 'yyyy-MM-dd')),1,10) as…
Abbas Gadhia
  • 14,532
  • 10
  • 61
  • 73
3
votes
0 answers

Is there a way to keep track of schema change in a Hive metastore?

I'm looking for a possible solution to keep track of all schema changes in a Hive metastore such as create new table, add/remove columns, change column type and etc. I haven't found any so far. Should I just monitor the MySQL db that stores the meta…
piggybox
  • 1,689
  • 1
  • 15
  • 19
3
votes
2 answers

Why do I get the error "Thrift::TException=HASH(0x122b9e0)" when I try to execute a statement with Thrift::API::HiveClient?

I am trying to connect to Apache Hive from a Perl script but I'm getting the following error: Thrift::TException=HASH(0x122b9e0) I am running with Hadoop version 2.7.0, Hive version 1.1.0, and Thrift::API::HiveClient version 0.003. Here is the…
Koushik Chandra
  • 1,565
  • 12
  • 37
  • 73
3
votes
1 answer

How to change the length of a column name in a Hive table?

I have a hive table where the column names are longer than the usual. I referred to the hive metastore for the table definition. This is how it looks: DESCRIBE hive.columns_v2; Output: Name || Null || Type ----------- …
trips
  • 111
  • 1
  • 9
3
votes
2 answers

Can a Hive custom SerDe produce multiple rows?

I am using Hive 0.13.1 and I created a custom SerDe that is able to process a special kind of xml data. So far so good. I also created a class for the InputFormat that splits the input data. Is it possible that I produce multiple rows (output) in…
S. Walz
  • 31
  • 1
2
votes
2 answers

I'm installing Hive 2.0.0 with Hadoop 2.7.2

I' trying to install Hive 2.0.0 with Hadoop 2.7.2 But I don't know what's the problem in my execution parallels@ubuntu:/usr/local/apache-hive-2.0.0-bin$ ./bin/hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in…
1
2 3 4 5 6 7