0

I'm trying to take value from a non-empty row and overwrite it in the subsequent rows until another non-empty row appears and then write that in the subsequent rows. Coming from ABAP Background, I'm not sure how to accomplish this in HANA SQL Script. Here's a picture to show what the data looks like.

Source Data Image

Basically 'Doe, John' should be overwritten into all the empty rows until 'Doe, Jane' appears and then 'Doe, Jane' should be overwritten into empty rows until another name appears.

My idea is to store the non-empty row in a local variable, but I haven't had much success so far. Here's my code:

tempTab1 = SELECT

  CASE WHEN EMPLOYEE <> ''
    THEN lv_emp = EMPLOYEE
  ELSE EMPLOYEE
  END AS EMPLOYEE,
FROM :tempTab;
Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48

2 Answers2

0

In general, rows in dataset are unordered until you explicitly specify ORDER BY part of SQL. If you observe some order it may be a side-effect and can vary. So first of all you have to explicitly create a row number column (assume it's name is RECORD). Then you should go this way:

  1. Select only rows with non-empty data in column.
  2. Use LEAD(RECORD) over(order by RECORD) to identify the next non-empty record number.
  3. Join your source dataset to dataset defined on step 3 on between condition for RECORD field.
with a as (
  select 1 as record, 'Val1' as field1 from dummy union
  select 2 as record, '' as field1 from dummy union
  select 3 as record, '' as field1 from dummy union
  select 4 as record, 'Val2' as field1 from dummy union
  select 5 as record, '' as field1 from dummy union
  select 6 as record, '' from dummy union
  select 7 as record, '' from dummy union
  select 8 as record, 'Val3' as field1 from dummy
)
, fill_base as (
  select field1, record, lead(record, 1, record) over(order by record asc) as next_record
  from a
  where field1 <> '' and field1 is not null
)
select
  a.record
  , case
      when a.field1 = '' or a.field1 is null
      then f.field1
      else a.field1
  end as field1
  , a.field1 as field1_original
from a
  left join fill_base as f
    on a.record > f.record
      and a.record < f.next_record

The performance in HANA may be bad in some cases since it process window functions very bad.

Here is another more elegant solution with two nested window functions than does not force you to write multiple selects for each column: How to make LAG() ignore NULLS in SQL Server?

astentx
  • 6,393
  • 2
  • 16
  • 25
0

You can use window aggregate function LAST_VALUE to achieve the imputation of missing values.

Sample Data

CREATE TABLE sample (id integer, sort integer, value varchar(10));
INSERT INTO sample VALUES (4711, 1, 'Hello');
INSERT INTO sample VALUES (4712, 2, null);
INSERT INTO sample VALUES (4713, 3, null);
INSERT INTO sample VALUES (4714, 4, 'World');
INSERT INTO sample VALUES (4715, 5, null);
INSERT INTO sample VALUES (4716, 6, '!');

Generate a new column with imputed values

SELECT base.*, LAST_VALUE(fill.value ORDER BY fill.sort) AS value_imputed
FROM sample base
LEFT JOIN sample fill ON fill.sort <= base.sort AND fill.value IS NOT NULL
GROUP BY base.id, base.sort, base.value
ORDER BY base.id, base.sort

Result

enter image description here

Note that sort could be anything determining the order (e.g. a timestamp).

Mathias Kemeter
  • 933
  • 2
  • 11