Questions tagged [hive-serde]

SerDe is short for Serializer/Deserializer, an interface used by Hive for both serialization and deserialization during IO and also interpreting the results of serialization as individual fields. A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats.

Official documentation page: SerDe

There are many SerDe bundled with Hive as well as third-party SerDe, such as:

  • LazySimpleSerDe
  • OpenCSVSerDe
  • RegexSerDe
  • JsonSerDe
  • AvroSerDe
  • ParquetHiveSerDe
  • OrcSerDe
  • MultiDelimitSerDe
164 questions
14
votes
2 answers

Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table, you get this: STORED AS INPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT …
Jason
  • 173
  • 1
  • 1
  • 8
11
votes
1 answer

What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive?

Im new to Bigdata and currently learning Hive. I understood the concept of InputFormat & OutputFormat in Hive as part of SerDe. I also understood that 'Stored as' is used to store a file in a particular format just like InputFormat. But I don't…
Metadata
  • 2,127
  • 9
  • 56
  • 127
9
votes
1 answer

Why does all columns get created as string when I use OpenCSVSerde in Hive?

I am trying to create a table using the OpenCSVSerde and some integer and date columns. But the columns get converted to String. Is this an expected outcome? As a workaround, I do an explicit type-cast after this step (which makes the complete run…
ForeverLearner
  • 1,901
  • 2
  • 28
  • 51
6
votes
2 answers

JSON4S type hint does not work

The following test snippet implicit val formats = DefaultFormats + FullTypeHints(Contacts.classList) val serialized = Serialization.write(List(Mail(field = "random@mail.com", note = "Random…
Dyin
  • 5,815
  • 8
  • 44
  • 69
6
votes
1 answer

How to query struct array with Hive (get_json_object) or json serde

I am trying to query the following JSON example file stored on my HDFS { "tag1": "1.0", "tag2": "blah", "tag3": "blahblah", "tag4": { "tag4_1": [{ "tag4_1_1": [{ "tag4_1_1_1": { …
DatWunGuy102
  • 275
  • 1
  • 3
  • 11
6
votes
2 answers

SerDe properties list for AWS Athena (JSON)

I'm testing the Athena product of AWS, so far is working very good. But I want to know the list of SerDe properties. I've searched far and wide and couldn't find it. I'm using this one for example "ignore.malformed.json" = "true", but I'm pretty…
Laerion
  • 805
  • 2
  • 13
  • 18
4
votes
2 answers

How to build a hive table on data which is separated by '^P' delimiter

My query is: CREATE EXTERNAL TABLE gateway_staging ( poll int, total int, transaction_id int, create_time timestamp, update_time timestamp ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^P'; (I am not sure whether '^P' can be used as a…
Andy Reddy
  • 93
  • 2
  • 9
3
votes
1 answer

Load log data in hive table using serde regex is null

I want to parse this log sample May 3 11:52:54 cdh-dn03 init: tty (/dev/tty6) main process (1208) killed by TERM signal May 3 11:53:31 cdh-dn03 kernel: registered taskstats version 1 May 3 11:53:31 cdh-dn03 kernel: sr0: scsi3-mmc drive:…
Programmeur
  • 190
  • 1
  • 14
3
votes
0 answers

Hive: Challenges with Nested JSON data into Hive tables

I am trying to load deeply nested JSON data into hive tables. Let me tell you guys what I tried so far. 1- I have JSON files and they are deeply nested like array of structs which have struct fields again. 2- I successfully loaded this json data…
Hadoop-worker
  • 196
  • 11
3
votes
2 answers

hive sql, serde how to not quote my fields?

Since by default serde quotes fields by ", How can I not quote my fields using serde? I tried: row format serde "org.apache.hadoop.hive.serde2.OpenCSVSerde" with serdeproperties( "separatorChar" = ",", "quoteChar" = "") But i'm getting FAILED:…
woshitom
  • 4,811
  • 8
  • 38
  • 62
2
votes
0 answers

Hive Parquet Timestamp Serde issues

I have Parquet File with timestamp column serialized as Long (BigInt). My Hive table is as below: Create external table my_table ( column_1 string, column_2 timestamp) partitioned by ( column_1 string) Row format Serde …
Kartik
  • 39
  • 4
2
votes
1 answer

OpenCSVSerde escapeChar overriding quoteChar

I have a number of csv’s I’m importing into Hive and I’ve found that my escapeChar of a new line is being triggered even when it is within a quoted field, which is my quoteChar. Is there any straightforward way around this dilemma? Line1field1…
Jgreen727
  • 75
  • 9
2
votes
1 answer

HIVE - escape double quote issue

I am trying to load a csv with pipe delimiter to an hive external table. The pipe occurring within data fields are enclosed within quotes. Double quotes occurring within data are escaped with \ . When I configure external table, I see data with…
Naresh S
  • 765
  • 2
  • 10
  • 19
2
votes
0 answers

Athena : no viable alternative at input for 'create external table' in python

I am trying to create an Athena table for s3 server access logs in my python cdk code with the help of this AWS link: https://aws.amazon.com/premiumsupport/knowledge-center/analyze-logs-athena/ The table gets created successfully from the Athena…
2
votes
0 answers

JSON SerDe JAR not getting detected while creating a Table in Hive

Can anyone please tell a proper solution for the below error scenario. Below mentioned details are my Hadoop and its ecosystem versions Hadoop Version: 2.7.1 Hive Version: 1.2.2 I was trying to install and configure JSON SerDe into my hive's…
Karthik Velu
  • 49
  • 2
  • 7
1
2 3
10 11