1

I have input table as below-

id col1 col2 time
01 abc 001 12:00
01 def 002 12:10

Required output table-

id col1 col2 time diff_field
01 abc 001 12:00 null
01 def 002 12:10 col1,col2

I need to compare both the rows and find all the columns for which there is difference in value and keep those column names in a new column diff_field.

I need a optimized solution for this as my table has more than 100 columns(all the columns need to be compared)

isha
  • 143
  • 1
  • 10
  • make instructions more clear? are you comparing row against row and record which columns had different values in the latter row? if so why does your output table diff_field column second row does not have "time" as values in time column are different in row1 vs row2? – Yev Guyduy Feb 07 '23 at 14:03

2 Answers2

1

Below approach has no dependency on actual columns' names or any names convention rather then only id and time

create temp function extract_keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
  """;
create temp function extract_values(input string) returns array<string> language js as """
  return Object.values(JSON.parse(input));
  """;
select t.*, 
  ( select string_agg(col)
    from unnest(extract_keys(cur)) as col with offset
    join unnest(extract_values(cur)) as cur_val with offset using(offset)
    join unnest(extract_values(prev)) as prev_val with offset using(offset)
    where cur_val != prev_val and col != 'time'
  ) as diff_field
from (
  select t, to_json_string(t) cur, to_json_string(ifnull(lag(t) over(win), t)) prev
  from your_table t
  window win as (partition by id order by time)
)     

if apply to sample data in your question (or rather extended version of it that I borrowed from Jaytiger answer) - the output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • It works great for me with over 200+ columns; thank you! In combination with comparing tables in the attached answer, I was able to perform unit testing to check the consistency of my query and find the differences https://stackoverflow.com/a/57912743/7224607 – arielhasidim Jun 06 '23 at 06:26
0

You might consider below approach:

WITH sample_table AS (
  SELECT '01' id, 'abc' col1, '001' col2, '12:00' time UNION ALL
  SELECT '01' id, 'def' col1, '002' col2, '12:10' time UNION ALL
  SELECT '01' id, 'def' col1, '002' col2, '12:20' time UNION ALL
  SELECT '01' id, 'ddf' col1, '002' col2, '12:30' time
)
SELECT * EXCEPT(curr, prev),
       (SELECT STRING_AGG('col' || offset) 
          FROM UNNEST(SPLIT(curr)) c WITH offset
          JOIN UNNEST(SPLIT(prev)) p WITH offset USING (offset)
         WHERE c <> p AND offset < ARRAY_LENGTH(SPLIT(curr)) - 1
       ) diff_field
  FROM (
    SELECT *, FORMAT('%t', t) AS curr, LAG(FORMAT('%t', t)) OVER w AS prev
      FROM sample_table t
    WINDOW w AS (PARTITION BY id ORDER BY time)
  );

Query results

enter image description here

Jaytiger
  • 11,626
  • 2
  • 5
  • 15