Hive Array> Insertion shows null

Question

I created a table temp that has array of struct

create table temp (regionkey smallint, name string, comment string, nations array<struct<n_nationkey:smallint,n_name:string,n_comment:string>>) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '|' 
COLLECTION ITEMS TERMINATED BY ',';

Then I loaded the data into the table

LOAD DATA LOCAL INPATH '/Data Sets/region.csv' INTO TABLE temp;

Desired output when did

select * from temp;

is

4   EUROPE  Low sale Business Region [{"n_nationkey":22,"n_name":"Ryan","n_comment":"Reference the site"}]

But actual output is

4   EUROPE  Low sale Business Region    [{"n_nationkey":22,"n_name":null,"n_comment":null},{"n_nationkey":null,"n_name":null,"n_comment":null},{"n_nationkey":null,"n_name":null,"n_comment":null}]

DATA FILE

4|EUROPE|Low sale Business Region for Training4Exam.com|7,Bulgaria,Reference 
4|EUROPE|Low sale Business Region for HadoopExam.com|19,Belgium,Reference site 
4|EUROPE|Low sale Business Region for Training4Exam.com|22,Ryan,Reference site

This was my first exam with arrays and struct and I am blank on this. Any help with be highly appreciated. Thanks

try to add some samples from your region.csv. the most probably reason is your data doesn't match the table layout — hlagos, Jun 14 '17 at 23:04

David דודו Markovitz · Accepted Answer · 2017-06-20T20:24:11.237

map keys terminated by ','

create external table temp 
(
    regionkey   smallint
   ,name        string
   ,comment     string
   ,nations     array<struct<n_nationkey:smallint,n_name:string,n_comment:string>>
) 
row format delimited 
fields terminated by '|' 
map keys terminated by ','
;

select * from temp
;

+-----------+--------+------------------------------------------------+-----------------------------------------------------------------------+
| regionkey |  name  |                    comment                     |                                nations                                |
+-----------+--------+------------------------------------------------+-----------------------------------------------------------------------+
|         4 | EUROPE | Low sale Business Region for Training4Exam.com | [{"n_nationkey":7,"n_name":"Bulgaria","n_comment":"Reference "}]      |
|         4 | EUROPE | Low sale Business Region for HadoopExam.com    | [{"n_nationkey":19,"n_name":"Belgium","n_comment":"Reference site "}] |
|         4 | EUROPE | Low sale Business Region for Training4Exam.com | [{"n_nationkey":22,"n_name":"Ryan","n_comment":"Reference site"}]     |
+-----------+--------+------------------------------------------------+-----------------------------------------------------------------------+

FYI

To be backward-compatible, initialize the first 3 separator to be the given values from the table properties.
The default number of separators is 8;
if only hive.serialization.extend.nesting.levels is set, the number of separators is extended to 24;
if hive.serialization.extend.additional.nesting.levels is set, the number of separators is extended to 154.
@param tableProperties table properties to extract the user provided separators

https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySerDeParameters.java

Found 2 things on the way: **(1)** bug in serde.thrift https://github.com/apache/hive/blob/master/serde/if/serde.thrift `const string COLLECTION_DELIM = "colelction.delim"` (note the typo in `colelction`) **(2)** The documentation of the delimiters is problematic. It is actually not fields, collection items and map keys but level 1,2, and 3 and there is even an additional level with default of ascii 4 — David דודו Markovitz, Jun 20 '17 at 19:39

Mohsen Sichani · Answer 2 · 2021-03-02T02:19:04.120

David's answer is very efficient & I liked it very much but cannot understand why collection items must be replaced by map keys (Seems there is a bug in Hive based on the description as he has suggested, I am not a pro in coding).

However, this is the long version

create table regiontemp(str string);
load data inpath '/user/cloudera/MohsenFiles/first_first.csv' into table regiontemp;
create external table region (r_regionkey smallint,
r_name      string,
r_comment   string,
r_nations   array<struct<n_nationkey:smallint,n_name:string,n_comment:string>>)
row format delimited
fields terminated by '|'
collection items terminated by ','

insert overwrite table region
select split(str,'\\|')[0] r_regionkey,
split(str,'\\|')[1] r_name,
split(str,'\\|')[2] r_comment,
array(named_struct("n_nationkey",cast(split(split(str,'\\|')[3],",")[0] as smallint),
                             "n_name",split(split(str,'\\|')[3],",")[1] ,
                             "n_comment",split(split(str,'\\|')[3],",")[2] ))
from regiontemp ;

now in impala INVALIDATE METADATA;

or in Hive (Aggregation On Struct columns Hive again based on David's answer for another Q)

Hive Array> Insertion shows null

2 Answers2

FYI