1

The problem

I have some tables with dates and numerical values; some numerical values are missing. I want those missing values to be filled in with the value available at the last date.

I need to do it:

  1. In a Sybase database, to which I only have read access; I can create temporary tables but not any permanent tables nor permanent views
  2. In a Microsoft SQL Server 2019 (Version 15) database, to which I have full access

I have found how to do it in Microsoft SQL Server, but not in Sybase.

UPDATE: If I run select @@version I get

Adaptive Server Enterprise/16.0 SP02 PL08 Instrumented

E.g. I need to go from

+-------+-------+
| date  | value |
+-------+-------+
| 1-Nov |   100 |
| 2-Nov |       |
| 3-Nov |       |
| 4-Nov |   110 |
| 5-Nov |       |
| 6-Nov |   105 |
+-------+-------+

to (see the asterisks):

+-------+--------+
| date  | value  |
+-------+--------+
| 1-Nov |   100  |
| 2-Nov |   *100 |
| 3-Nov |   *100 |
| 4-Nov |   110  |
| 5-Nov |   *110 |
| 6-Nov |   105  |
+-------+--------+

Attempt #1

I found an example here

In SQL Server, it runs without errors but it doesn't update anything. In Sybase, it doesn't run at all. I get:

incorrect syntax near the keyword top

This is a reproducible example

CREATE TABLE #my_test (my_date datetime, my_value float NULL )
go

INSERT INTO #my_test SELECT '1-Nov-2021',100
INSERT INTO #my_test SELECT '2-Nov-2021',NULL
INSERT INTO #my_test SELECT '3-Nov-2021',NULL
INSERT INTO #my_test SELECT '4-Nov-2021',110
INSERT INTO #my_test SELECT '5-Nov-2021',NULL
INSERT INTO #my_test SELECT '6-Nov-2021',105

go


UPDATE #my_test
set my_value = (
                select top 1 b.my_value
                from #my_test b
                where b.my_date < a.my_date and b.my_date = a.my_date and b.my_value is not null
                order by b.my_date desc
                
                )
                
from #my_test a
where a.my_value is null

go

Attempt #2

Outer apply works with Microsoft SQL but it doesn't seem supported in Sybase: in Sybase I get

Incorrect syntax near 'outer'

update #my_test
set my_value = coalesce(pr.my_value, nx.my_value)

from #my_test m

outer apply --next non-null value
(
select top 1 *
from #my_test x
where x.my_value is not null
and x.my_date > m.my_date
order by my_date 
) nx

outer apply -- previous non-null
(select top 1 *
from #my_test x
where x.my_value is not null
and x.my_date < m.my_date
order by my_date desc
) pr

where m.my_value is null

Attempt #3

This was in the same link as before. This code works in SQL Sevrer but Sybase tells me:

You cannot use a derived table in the FROM clause of an UPDATE or DELETE statement

My code:

update #my_test  set #my_test.my_value = tt.NewAmount 
from #my_test t
inner join (
    select my_date, coalesce(min(my_value) over (order by my_date desc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW),
                           min(my_value) over (order by my_date asc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)) NewAmount
    from #my_test t
) tt on tt.my_date = t.my_date
where t.my_value is null
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112
  • Honestly, you might well be better off using a linked server connection to query the Sybase database from SQL Server. Though yes, Sybase and SQL Server both use T-SQL, the former's is a much *older* version of it and doesn't support a lot of the functionality taken for granted in SQL Server. – Thom A Nov 08 '21 at 14:50
  • What exactly do you mean by "linked sever connection" and how would you go about setting one up? The Sybase server is the main database underpinning the most important data of a certain company. Access is very restricted. – Pythonista anonymous Nov 08 '21 at 14:54
  • [Create Linked Servers (SQL Server Database Engine)](https://learn.microsoft.com/en-us/sql/relational-databases/linked-servers/create-linked-servers-sql-server-database-engine?view=sql-server-ver15) – Thom A Nov 08 '21 at 14:57
  • which Sybase RDBMS (ASE? IQ? SQLAnywhere? Advantage?)? – markp-fuso Nov 08 '21 at 20:33
  • Adaptive Server Enterprise 16.0 SP02. I will update my original question to reflect this – Pythonista anonymous Nov 08 '21 at 20:37
  • four ways to forward fill: https://www.andrewvillazon.com/forward-fill-values-t-sql/ I am surprised tsql does not have a ffill api – Golden Lion Nov 09 '21 at 15:04

2 Answers2

1

This operation can be performed with some 'basic' sub-query syntax ...

One idea using a select to fill the gap (eg, OP using a select into):

select  mt1.*,isnull(mt1.my_value,
                        (select mt2.my_value
                         from   #my_test mt2
                         where  mt2.my_date = (select   max(my_date)
                                                from    #my_test mt3
                                                where   mt3.my_date < mt1.my_date
                                                and     mt3.my_value is not NULL))) as new_value
from    #my_test mt1
order by mt1.my_date           -- for display purposes
go

 my_date                         my_value    new_value
 ------------------------------- ----------- -----------
             Nov  1 2021 12:00AM         100         100
             Nov  2 2021 12:00AM        NULL         100
             Nov  3 2021 12:00AM        NULL         100
             Nov  4 2021 12:00AM         110         110
             Nov  5 2021 12:00AM        NULL         110
             Nov  6 2021 12:00AM         105         105

Assuming data is already in the #temp table and OP wants to run an update:

update  #my_test
set     my_value = (select      mt2.my_value
                        from    #my_test mt2
                        where   mt2.my_date = (select   max(mt3.my_date)
                                                from    #my_test mt3
                                                where   mt3.my_date < mt1.my_date
                                                and     mt3.my_value is not NULL))

from    #my_test mt1
where   my_value is NULL
go

select * from #my_test
order by my_date
go

 my_date                         my_value
 ------------------------------- -----------
             Nov  1 2021 12:00AM         100
             Nov  2 2021 12:00AM         100
             Nov  3 2021 12:00AM         100
             Nov  4 2021 12:00AM         110
             Nov  5 2021 12:00AM         110
             Nov  6 2021 12:00AM         105

NOTES:

  • it's not clear (to me) what we're supposed to do if the 'first' n rows in the table have my_value=NULL (ie, there's no 'previous' my_value to copy); if this is an issue OP can probably wrap the sub-select in a isnull(<sub-select>,@default_value)
  • above code tested in ASE 16.0 SP04 GA (though this should work on most (all?) versions as we're just using standard sub-query syntax)
  • for largish data volumes the queries will benefit from an index on (my_date [,my_value]); while OP may not be able to add an index to a permanent table they should be able to add an index to #my_test before performing an update (said index should be created after the initial population of the table so as to provide usable stats for my_date)
  • whether an index on #my_test will be used will depend on server and session level settings for statement_cache, literal_autoparam and deferred_name_resolution as well as volume of data in #my_test
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • It works in Sybase, thank you! In MsSQL, what are the pros and cons of this vs the other solutions? Do I understand correctly that this approach works only when you have to fill one NULL field, not more than one? – Pythonista anonymous Nov 08 '21 at 20:55
  • 1
    I can't speak to the pros/cons of these 'basic' sub-queries vs the MSSQL constructs; you'd need to weigh performance and whether or not you want to maintain two sets of code (one for MSSQL, one for Sybase/SAP); this solution ***does*** update multiple consecutive NULLs (see `Nov 2 2021` and `Nov 3 2021`) – markp-fuso Nov 08 '21 at 21:03
0

Depending on the number of NULLS you need to backfill, you will end up to around 50% of the rows requiring an update.

In all databases I encountered, this situation calls for a CREATE TABLE ... AS SELECT, rather than a mass update. And - in my example, I avoided the use of keywords like DATE or VALUE .

Hence - here's a complete example - using LAST_VALUE( ... IGNORE NULLS) .

DROP TABLE IF EXISTS indata;
CREATE TABLE
indata(dt,val) AS (
          SELECT DATE '1-Nov-2021',100
UNION ALL SELECT DATE '2-Nov-2021',NULL
UNION ALL SELECT DATE '3-Nov-2021',NULL
UNION ALL SELECT DATE '4-Nov-2021',110
UNION ALL SELECT DATE '5-Nov-2021',NULL
UNION ALL SELECT DATE '6-Nov-2021',105
);

DROP TABLE IF EXISTS outdata;
CREATE TABLE outdata AS
SELECT
  dt
, LAST_VALUE(val IGNORE NULLS) OVER(ORDER BY dt) AS val
FROM indata
;

ALTER TABLE indata RENAME TO indata_old;
-- sp_rename indata, indata_old;
ALTER TABLE outdata  RENAME TO indata;
-- sp_rename indata, indata; -- do you rename tables like this in Sybase?
SELECT * FROM indata;
-- out      dt     | val 
-- out ------------+-----
-- out  2021-11-01 | 100
-- out  2021-11-02 | 100
-- out  2021-11-03 | 100
-- out  2021-11-04 | 110
-- out  2021-11-05 | 110
-- out  2021-11-06 | 105
marcothesane
  • 6,192
  • 1
  • 11
  • 21
  • I have tried in both Ms SQL Server and Sybase. Both give me "Incorrect syntax near ignore". Could I please ask you to double check your code? If it works for you, what SQL database are you using? – Pythonista anonymous Nov 08 '21 at 20:30
  • For context, the missing dates are Saturdays, Sundays and public holidays, so ca. 30% of the records. My tables start at 7,000ish records and might grow to 50,000 but not more than that, so not huge tables. – Pythonista anonymous Nov 08 '21 at 20:33
  • I suggest you try two scenarios: 1. use the scenario I showed, insert ... select, and rename. 2. use the target or the insert .. select as the USING table of a MERGE statement and finally decide for the faster – marcothesane Nov 09 '21 at 10:08