1

I have a field which holds a short list of ids of a fixed length. e.g. aab:aac:ada:afg

The field is intended to hold at most 5 ids, growing gradually. I update it by adding from a similarly constructed field that may partially overlap with my existing set, e.g. ada:afg:fda:kfc.

The field expans when joined to an "update" table, as in the following example. Here, id_list is the aforementioned list I want to "merge", and table_update is a table with new values I want to "merge" into table1.

insert overwrite table table1
select 
  id,
  field1,
  field2,
  case
    when (some condition) then a.id_list
    else merge(a.id_list, b.id_list)
  end as id_list

from table1 a
left join
table_update b
on a.id = b.id;

I'd like to produce a combined field with the following value: aab:aac:ada:afg:fda.

The challenge is that I don't know whether or how much overlap the strings have until execution, and I cannot run any external code, or create UDFs.

Any suggestions how I could approach this?

Jono
  • 202
  • 4
  • 13

1 Answers1

1

Split to get arrays, explode them, select existing union all new, aggregate using collect_set, it will produce unique array, concatenate array into string using concat_ws(). Not tested:

select concat_ws(':',collect_set(id)) 
from
(
select explode(split('aab:aac:ada:afg',':')) as id  --existing
union all 
select explode(split('ada:afg:fda:kfc',':')) as id  --new
);

You can use UNION instead UNION ALL to get distinct values before aggregating into array. Or you can join new and existing and concatenate strings into one, then do the same:

select concat_ws(':',collect_set(id)) 
from
(
select explode(split(concat('aab:aac:ada:afg',':','ada:afg:fda:kfc'),':')) as id  --existing+new
);

Most probably you will need to use lateral view with explode in the real query. See this answer about lateral view usage

Update:

insert overwrite table table1

select  concat_ws(':',collect_set(a.idl)) as id_list,
       id,
       field1,
       field2
from
(
select 
  id,
  field1,
  field2,
  split(
  case
    when (some condition) then a.id_list
    when b.id_list is null then a.id_list
    else concat(a.id_list,':',b.id_list)
  end,':') as id_list_array 

from table1 a
     left join table_update b on a.id = b.id
)s

LATERAL VIEW OUTER explode(id_list_array ) a AS idl
group by 
       id,
       field1,
       field2
;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Ah! Ok, that's a start, but I need to add a clarification to the question: the fields are in two tables, and I want to combine them for each row, not two single strings. – Jono Sep 03 '18 at 11:31
  • @Jono you can join new and existing and concatenate strings into one, then do the same – leftjoin Sep 03 '18 at 12:00
  • Cool! I'll see if I can implement this and get back here. – Jono Sep 04 '18 at 10:27