3

I'm working on some complex sales analytics, which is very convoluted ... and boring ...

So for this question I'll use a fun, sugary metaphor: vending machines.
But my actual tables are structured the same way.
(You can assume there's plenty of indexing, constraints, etc.)

  • BASIC TABLE #1 - INVENTORY

Let's say we have a table containing vending machine inventory data.
This table simply shows exactly how many, of each type of candy, is currently available in each vending machine.

I know, normally, there would be an ITEM_TYPE table containing rows for 'Snickers', 'Milky Way' etc. but that isn't how our tables are constructed, for multiple reasons.
In reality, it's not a product count, but instead it's pre-aggregated sales data: "Pipeline Total", "Forecast Total", etc.
So a simple table with separate columns for different "types" of totals, is what I have to work with.

For this example, I also added some text columns, to demonstrate that I have to account for a variety of datatypes.
(This complicates things.)

Except for ID, all columns are nullable - this is a real issue.
As far as we're concerned, if the column is NULL, then NULL is the official value we need to use for analytics and reporting.

enter image description here

CREATE table "VENDING_MACHINES" (
    "ID"                 NUMBER NOT NULL ENABLE,
    "SNICKERS_COUNT"     NUMBER,
    "MILKY_WAY_COUNT"    NUMBER,
    "TWIX_COUNT"         NUMBER,
    "SKITTLES_COUNT"     NUMBER,
    "STARBURST_COUNT"    NUMBER,
    "SWEDISH_FISH_COUNT" NUMBER,
    "FACILITIES_ADDRESS" VARCHAR2(100),
    "FACILITIES_CONTACT" VARCHAR2(100),

    CONSTRAINT "VENDING_MACHINES_PK" PRIMARY KEY ("ID") USING INDEX ENABLE
)
/

Example data:

INSERT INTO VENDING_MACHINES (ID, SNICKERS_COUNT, MILKY_WAY_COUNT, TWIX_COUNT,
                              SKITTLES_COUNT, STARBURST_COUNT, SWEDISH_FISH_COUNT,
                              FACILITIES_ADDRESS, FACILITIES_CONTACT)
SELECT 225, 11, 15, 14, 0, NULL, 13, '123 Abc Street', 'Steve' FROM DUAL UNION ALL
SELECT 349, NULL, 7, 3, 11, 8, 7, NULL, '' FROM DUAL UNION ALL
SELECT 481, 8, 4, 0, NULL, 14, 3, '1920 Tenaytee Way', NULL FROM DUAL UNION ALL
SELECT 576, 4, 2, 8, 4, 9, NULL, '', 'Angela' FROM DUAL
  • BASIC TABLE #2 - CHANGE LOG

Vending machines will periodically connect to the database and update their inventory records.
Maybe they update every time somebody buys something, or maybe they update every 30 minutes, or maybe they only update when someone refills the candy - honestly it doesn't matter.

What does matter is, whenever a record is updated in the VENDING_MACHINES table, a trigger is executed which logs every individual change in a separate log table VENDING_MACHINES_CHANGE_LOG.
This trigger has already been written, and it works great.
(If a column is "updated" with the same value that was already present, the change should be ignored by the trigger.)

A separate row is logged for each column that was modified in the VENDING_MACHINES table (except for ID).
Therefore, if a brand new row is inserted in the VENDING_MACHINES table (i.e. a new vending machine), eight rows will be logged in the VENDING_MACHINES_CHANGE_LOG table - one for each non-ID column in VENDING_MACHINES.

(In my real-world scenario, there are 90+ columns being tracked.
But usually only one or two columns are being updated at any given time, so it doesn't get out of hand.)

This "change log" is intended to be a permanent history of the VENDING_MACHINES table, so we won't create a foreign key constraint - if a row is deleted from VENDING_MACHINES we want to retain orphaned historical records in the change log.
Also, Apex doesn't support ON UPDATE CASCADE (?), so the trigger has to check for updates to the ID column, and manually propagate the update throughout related tables (e.g. the change log).

enter image description here

CREATE table "VENDING_MACHINE_CHANGE_LOG" (
    "ID"                   NUMBER       NOT NULL ENABLE,
    "CHANGE_TIMESTAMP"     TIMESTAMP(6) NOT NULL ENABLE,
    "VENDING_MACHINE_ID"   NUMBER       NOT NULL ENABLE,
    "MODIFIED_COLUMN_NAME" VARCHAR2(30) NOT NULL ENABLE,

    "MODIFIED_COLUMN_TYPE" VARCHAR2(30) GENERATED ALWAYS AS
        (CASE "MODIFIED_COLUMN_NAME" WHEN 'FACILITIES_ADDRESS' THEN 'TEXT'
                                     WHEN 'FACILITIES_CONTACT' THEN 'TEXT'
                                     ELSE 'NUMBER' END) VIRTUAL NOT NULL ENABLE,

    "NEW_NUMBER_VALUE"     NUMBER,
    "NEW_TEXT_VALUE"       VARCHAR2(4000),

    CONSTRAINT "VENDING_MACHINE_CHANGE_LOG_CK" CHECK
        ("MODIFIED_COLUMN_NAME" IN('SNICKERS_COUNT', 'MILKY_WAY_COUNT', 'TWIX_COUNT',
                                   'SKITTLES_COUNT', 'STARBURST_COUNT', 'SWEDISH_FISH_COUNT',
                                   'FACILITIES_ADDRESS', 'FACILITIES_CONTACT')) ENABLE,

    CONSTRAINT "VENDING_MACHINE_CHANGE_LOG_PK" PRIMARY KEY ("ID") USING INDEX ENABLE,

    CONSTRAINT "VENDING_MACHINE_CHANGE_LOG_UK" UNIQUE ("CHANGE_TIMESTAMP",
                                                       "VENDING_MACHINE_ID",
                                                       "MODIFIED_COLUMN_NAME") USING INDEX ENABLE

    /* No foreign key, since we want this change log to be orphaned and preserved.
       Also, apparently Apex doesn't support ON UPDATE CASCADE for some reason? */
)
/

Change log example data:

INSERT INTO VENDING_MACHINE_CHANGE_LOG (ID, CHANGE_TIMESTAMP, VENDING_MACHINE_ID,
                                        MODIFIED_COLUMN_NAME, NEW_NUMBER_VALUE, NEW_TEXT_VALUE)
SELECT 167, '11/06/19 05:18', 481, 'MILKY_WAY_COUNT', 5, NULL FROM DUAL UNION ALL
SELECT 168, '11/06/19 05:21', 225, 'SWEDISH_FISH_COUNT', 1, NULL FROM DUAL UNION ALL
SELECT 169, '11/06/19 05:40', 481, 'FACILITIES_ADDRESS', NULL, NULL FROM DUAL UNION ALL
SELECT 170, '11/06/19 05:49', 481, 'STARBURST_COUNT', 4, NULL FROM DUAL UNION ALL
SELECT 171, '11/06/19 06:09', 576, 'FACILITIES_CONTACT', NULL, '' FROM DUAL UNION ALL
SELECT 172, '11/06/19 06:25', 481, 'SWEDISH_FISH_COUNT', 7, NULL FROM DUAL UNION ALL
SELECT 173, '11/06/19 06:40', 481, 'FACILITIES_CONTACT', NULL, 'Audrey' FROM DUAL UNION ALL
SELECT 174, '11/06/19 06:46', 576, 'SNICKERS_COUNT', 13, NULL FROM DUAL UNION ALL
SELECT 175, '11/06/19 06:55', 576, 'FACILITIES_ADDRESS', NULL, '388 Holiday Street' FROM DUAL UNION ALL
SELECT 176, '11/06/19 06:59', 576, 'SWEDISH_FISH_COUNT', NULL, NULL FROM DUAL UNION ALL
SELECT 177, '11/06/19 07:00', 349, 'MILKY_WAY_COUNT', 3, NULL FROM DUAL UNION ALL
SELECT 178, '11/06/19 07:03', 481, 'TWIX_COUNT', 8, NULL FROM DUAL UNION ALL
SELECT 179, '11/06/19 07:11', 349, 'TWIX_COUNT', 15, NULL FROM DUAL UNION ALL
SELECT 180, '11/06/19 07:31', 225, 'FACILITIES_CONTACT', NULL, 'William' FROM DUAL UNION ALL
SELECT 181, '11/06/19 07:49', 576, 'FACILITIES_CONTACT', NULL, 'Brian' FROM DUAL UNION ALL
SELECT 182, '11/06/19 08:28', 481, 'SNICKERS_COUNT', 0, NULL FROM DUAL UNION ALL
SELECT 183, '11/06/19 08:38', 481, 'SKITTLES_COUNT', 7, '' FROM DUAL UNION ALL
SELECT 184, '11/06/19 09:04', 349, 'MILKY_WAY_COUNT', 10, NULL FROM DUAL UNION ALL
SELECT 185, '11/06/19 09:21', 481, 'SNICKERS_COUNT', NULL, NULL FROM DUAL UNION ALL
SELECT 186, '11/06/19 09:33', 225, 'SKITTLES_COUNT', 11, NULL FROM DUAL UNION ALL
SELECT 187, '11/06/19 09:45', 225, 'FACILITIES_CONTACT', NULL, NULL FROM DUAL UNION ALL
SELECT 188, '11/06/19 10:16', 481, 'FACILITIES_CONTACT', 4, 'Lucy' FROM DUAL UNION ALL
SELECT 189, '11/06/19 10:25', 481, 'SNICKERS_COUNT', 10, NULL FROM DUAL UNION ALL
SELECT 190, '11/06/19 10:57', 576, 'SWEDISH_FISH_COUNT', 12, NULL FROM DUAL UNION ALL
SELECT 191, '11/06/19 10:59', 225, 'MILKY_WAY_COUNT', NULL, NULL FROM DUAL UNION ALL
SELECT 192, '11/06/19 11:11', 481, 'STARBURST_COUNT', 6, 'Stanley' FROM DUAL UNION ALL
SELECT 193, '11/06/19 11:34', 225, 'SKITTLES_COUNT', 8, NULL FROM DUAL UNION ALL
SELECT 194, '11/06/19 11:39', 349, 'FACILITIES_CONTACT', NULL, 'Mark' FROM DUAL UNION ALL
SELECT 195, '11/06/19 11:42', 576, 'SKITTLES_COUNT', 8, NULL FROM DUAL UNION ALL
SELECT 196, '11/06/19 11:56', 225, 'TWIX_COUNT', 2, NULL FROM DUAL
  • DESIRED RESULT - QUERY (VIEW) TO RECONSTRUCT HISTORICAL TABLE ROWS FROM CHANGE LOG

I need to build a view that reconstructs the complete historical VENDING_MACHINES table, using only data from the VENDING_MACHINE_CHANGE_LOG table.
i.e. Since the change log rows are allowed to be orphaned, rows that have previously been deleted from VENDING_MACHINES should reappear.
The resulting view should allow me to retreive any VENDING_MACHINE row, exactly as it existed at any specific point in history.

The example data for VENDING_MACHINE_CHANGE_LOG is very short, and not quite enough to produce a complete result ...
But it should be enough to demonstrate the desired outcome.

Ultimately I think analytical functions will be required.
But I'm new to SQL analytical functions, and I'm new to Oracle and Apex as well.
So I'm not sure how to approach this - What's the best way to reconstruct the original table rows?

Here's what the desired result should look like (sorted by CHANGE_TIMESTAMP):

enter image description here

And here is the same desired result, additionally sorted by VENDING_MACHINE_ID:

enter image description here

I have built a simple query to pull the most recent column values for every VENDING_MACHINE_ID, but I don't think this method is suitable for this enormous task.
I think I need to use analytical functions instead, to get better performance and flexibility. (Or maybe I'm wrong?)

select vmcl.ID,
       vmcl.CHANGE_TIMESTAMP,
       vmcl.VENDING_MACHINE_ID,
       vmcl.MODIFIED_COLUMN_NAME,
       vmcl.MODIFIED_COLUMN_TYPE,
       vmcl.NEW_NUMBER_VALUE,
       vmcl.NEW_TEXT_VALUE

from ( select sqvmcl.VENDING_MACHINE_ID,
              sqvmcl.MODIFIED_COLUMN_NAME,
              max(sqvmcl.CHANGE_TIMESTAMP) as LAST_CHANGE_TIMESTAMP
       from VENDING_MACHINE_CHANGE_LOG sqvmcl
       where sqvmcl.CHANGE_TIMESTAMP <= /*[Current timestamp, or specified timestamp]*/
       group by sqvmcl.VENDING_MACHINE_ID, sqvmcl.MODIFIED_COLUMN_NAME ) sq

left join VENDING_MACHINE_CHANGE_LOG vmcl on vmcl.VENDING_MACHINE_ID = sq.VENDING_MACHINE_ID
                                         and vmcl.MODIFIED_COLUMN_NAME = sq.MODIFIED_COLUMN_NAME
                                         and vmcl.CHANGE_TIMESTAMP = sq.LAST_CHANGE_TIMESTAMP

Notice the left join specifically hits the unique index for the VENDING_MACHINE_CHANGE_LOG table - this is by design.

Giffyguy
  • 20,378
  • 34
  • 97
  • 168
  • 1
    The desired result that you showed in the question is a complete list of changes. Do you really need the full set of all changes, or you need to reconstruct how the table looked like at the **specific moment in time** (just for that moment)? – Vladimir Baranov Dec 01 '19 at 23:56
  • 1
    Another question. You said that Vending Machines may come and go, i.e. the set of IDs `VENDING_MACHINE_ID` is not fixed. What about the set of columns? Can you say that you have a fixed set of columns `SNICKERS_COUNT, MILKY_WAY_COUNT, TWIX_COUNT, SKITTLES_COUNT, STARBURST_COUNT, SWEDISH_FISH_COUNT, FACILITIES_ADDRESS, FACILITIES_CONTACT` that never changes? (And if does change, you are happy to update your SQL code accordingly) – Vladimir Baranov Dec 02 '19 at 00:00
  • @VladimirBaranov You're correct, obviously I would filter down to a desired window of timeframes. But that's why it's a view, so I can just pretend like I'm pulling the entire list, with the understanding that it will be filtered later. – Giffyguy Dec 02 '19 at 00:08
  • @VladimirBaranov Your second question, yes, the columns are fixed. And if we change the columns at some point, I'm happy to update the SQL to match it. I expect there's no practical way around this, and it's not a problem. – Giffyguy Dec 02 '19 at 00:11
  • I may have no real experience regarding your situation. But what you want can't be done using the standard SQL. However if you want to have a way to see your whole table in a state it used to be. Why not add a (status or deleted col) this can be a simple boolean that indicates wether a row is deleted/active/etc. You can then save all ur **deleted** rows (going of your question this does not seem like it would be alot of rows) whilst maintaining the main functionality that you have currently going on. – Tomm Dec 02 '19 at 07:47
  • @Tomm That's only *part* of the desired results. I need to be able to track all updates, inserts, *and* deletes. So if a column gets overwritten, we still have a log of the old and new values. That's what this change log provides. You can't emulate that with a simple flag column on the main table. But for the purposes of this question, I actually don't need anyone to worry about deleted rows - that's a bridge I can cross later (I've already put some work into it, actually). – Giffyguy Dec 02 '19 at 07:56
  • Fair enough, I have misunderstood or forgotten about that whilst writing my comment. It was quite a long read, with alot of data to consume. – Tomm Dec 02 '19 at 07:58

4 Answers4

3

I am going to ignore my feeling that this is an "XY problem" and answer this question only:

[How do I] Reconstruct historical table rows, based only on change-log data[?]

(For an approach for what I suspect might be the "real" problem, see this link about Flashback Archives in Oracle 12c: https://docs.oracle.com/database/121/ADFNS/adfns_flashback.htm#ADFNS01004)

For what you've got, I believe this is the query (for your view definition) that you are looking for:

SELECT 
    c.id change_id,
    c.change_timestamp as_of_timestamp,
    c.vending_machine_id,
    NULLIF(last_value(case when c.modified_column_name = 'SNICKERS_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) snickers_count,
    NULLIF(last_value(case when c.modified_column_name = 'MILKY_WAY_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) MILKY_WAY_COUNT,
    NULLIF(last_value(case when c.modified_column_name = 'TWIX_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) TWIX_COUNT,
    NULLIF(last_value(case when c.modified_column_name = 'SKITTLES_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) SKITTLES_COUNT,
    NULLIF(last_value(case when c.modified_column_name = 'STARBURST_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) STARBURST_COUNT,
    NULLIF(last_value(case when c.modified_column_name = 'SWEDISH_FISH_COUNT' THEN nvl(c.new_number_value,-99999) ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),-99999) SWEDISH_FISH_COUNT,
    NULLIF(last_value(case when c.modified_column_name = 'FACILITIES_ADDRESS' THEN nvl(c.new_text_value,'#NULL#') ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),'#NULL#') FACILITIES_ADDRESS,
    NULLIF(last_value(case when c.modified_column_name = 'FACILITIES_CONTACT' THEN nvl(c.new_text_value,'#NULL#') ELSE NULL END) ignore nulls over ( partition by c.vending_machine_id order by c.change_timestamp asc range between unbounded preceding and current row),'#NULL#') FACILITIES_CONTACT
FROM 
    VENDING_MACHINE_CHANGE_LOG c
ORDER BY 
    c.vending_machine_id, c.change_timestamp;

Basically, you have three problems:

  1. How do you account for different data types that might be stored in each column?
  2. How do you account for null values?
  3. How do you make the query run efficiently?

Problem #1 is answered by the fact that you are writing the logic for each view column by hand, so it is simple for view definition to use NEW_NUMBER_VALUE for, say, the SNICKERS_COUNT column and to use NEW_TEXT_VALUE for the FACILITIES_ADDRESS column.

Problem #2 is trickier. Consider the SNICKERS_COUNT column. You need to ignore changes that are not changes to SNICKERS_COUNT. It is easy to ignore those by making them null. However, the actual change values may also be null, and we don't want to ignore those. So, we must designate a non-null value to stand in for null values that we don't want to ignore. This designated value must be one that can never appear in the actual data. For numeric columns, I've chosen -99999 and for text columns, I've chosen '#NULL#'.

Problem #3 I've ignored. The very nature of your question is going to require that you read all the change logs from the beginning of time to build up their values as a given point in time. I don't see you getting off without a full table scan of VENDING_MACHINE_CHANGE_LOG.

So, let's break down one of the columns in the query to see what it is doing:

nullif(
  last_value(
     case when c.modified_column_name = 'SNICKERS_COUNT' 
          THEN nvl(c.new_number_value,-99999) 
          ELSE NULL END) 
  ignore nulls 
  over ( partition by c.vending_machine_id 
         order by c.change_timestamp asc 
         range between unbounded preceding and current row)
 ,-99999) snickers_count,

Start with this inner expression:

case when c.modified_column_name = 'SNICKERS_COUNT' 
              THEN nvl(c.new_number_value,-99999) 
              ELSE NULL END

If the modified column is not SNICKERS_COUNT, the expression is NULL. That is the only way it can be null. If the new_number_value was NULL, we convert it to our designated stand-in (-99999).

Then,

last_value(...case expression above...)
  ignore nulls 
  over ( partition by c.vending_machine_id 
         order by c.change_timestamp asc 
         range between unbounded preceding and current row)

... this tells Oracle to take the most recent non-null value for the case expression, with "most recent" being defined as the row with the highest change_timestamp for the set of rows that have the same vending_machine_id as the current row and only including changes up to the current row.

And finally,

nullif(... last_value expression above...
 ,-99999) snickers_count

This converts the designated stand-in value for null back to a true null.

Here are the results:

+-----------+---------------------------------+--------------------+----------------+-----------------+------------+----------------+-----------------+--------------------+--------------------+--------------------+
| CHANGE_ID |         AS_OF_TIMESTAMP         | VENDING_MACHINE_ID | SNICKERS_COUNT | MILKY_WAY_COUNT | TWIX_COUNT | SKITTLES_COUNT | STARBURST_COUNT | SWEDISH_FISH_COUNT | FACILITIES_ADDRESS | FACILITIES_CONTACT |
+-----------+---------------------------------+--------------------+----------------+-----------------+------------+----------------+-----------------+--------------------+--------------------+--------------------+
|       168 | 06-NOV-19 05.21.00.000000000 AM |                225 |                |                 |            |                |                 |                  1 |                    |                    |
|       180 | 06-NOV-19 07.31.00.000000000 AM |                225 |                |                 |            |                |                 |                  1 |                    | William            |
|       186 | 06-NOV-19 09.33.00.000000000 AM |                225 |                |                 |            |             11 |                 |                  1 |                    | William            |
|       187 | 06-NOV-19 09.45.00.000000000 AM |                225 |                |                 |            |             11 |                 |                  1 |                    |                    |
|       191 | 06-NOV-19 10.59.00.000000000 AM |                225 |                |                 |            |             11 |                 |                  1 |                    |                    |
|       193 | 06-NOV-19 11.34.00.000000000 AM |                225 |                |                 |            |              8 |                 |                  1 |                    |                    |
|       196 | 06-NOV-19 11.56.00.000000000 AM |                225 |                |                 |          2 |              8 |                 |                  1 |                    |                    |
|       177 | 06-NOV-19 07.00.00.000000000 AM |                349 |                |               3 |            |                |                 |                    |                    |                    |
|       179 | 06-NOV-19 07.11.00.000000000 AM |                349 |                |               3 |         15 |                |                 |                    |                    |                    |
|       184 | 06-NOV-19 09.04.00.000000000 AM |                349 |                |              10 |         15 |                |                 |                    |                    |                    |
|       194 | 06-NOV-19 11.39.00.000000000 AM |                349 |                |              10 |         15 |                |                 |                    |                    | Mark               |
|       167 | 06-NOV-19 05.18.00.000000000 AM |                481 |                |               5 |            |                |                 |                    |                    |                    |
|       169 | 06-NOV-19 05.40.00.000000000 AM |                481 |                |               5 |            |                |                 |                    |                    |                    |
|       170 | 06-NOV-19 05.49.00.000000000 AM |                481 |                |               5 |            |                |               4 |                    |                    |                    |
|       172 | 06-NOV-19 06.25.00.000000000 AM |                481 |                |               5 |            |                |               4 |                  7 |                    |                    |
|       173 | 06-NOV-19 06.40.00.000000000 AM |                481 |                |               5 |            |                |               4 |                  7 |                    | Audrey             |
|       178 | 06-NOV-19 07.03.00.000000000 AM |                481 |                |               5 |          8 |                |               4 |                  7 |                    | Audrey             |
|       182 | 06-NOV-19 08.28.00.000000000 AM |                481 |              0 |               5 |          8 |                |               4 |                  7 |                    | Audrey             |
|       183 | 06-NOV-19 08.38.00.000000000 AM |                481 |              0 |               5 |          8 |              7 |               4 |                  7 |                    | Audrey             |
|       185 | 06-NOV-19 09.21.00.000000000 AM |                481 |                |               5 |          8 |              7 |               4 |                  7 |                    | Audrey             |
|       188 | 06-NOV-19 10.16.00.000000000 AM |                481 |                |               5 |          8 |              7 |               4 |                  7 |                    | Lucy               |
|       189 | 06-NOV-19 10.25.00.000000000 AM |                481 |             10 |               5 |          8 |              7 |               4 |                  7 |                    | Lucy               |
|       192 | 06-NOV-19 11.11.00.000000000 AM |                481 |             10 |               5 |          8 |              7 |               6 |                  7 |                    | Lucy               |
|       171 | 06-NOV-19 06.09.00.000000000 AM |                576 |                |                 |            |                |                 |                    |                    |                    |
|       174 | 06-NOV-19 06.46.00.000000000 AM |                576 |             13 |                 |            |                |                 |                    |                    |                    |
|       175 | 06-NOV-19 06.55.00.000000000 AM |                576 |             13 |                 |            |                |                 |                    | 388 Holiday Street |                    |
|       176 | 06-NOV-19 06.59.00.000000000 AM |                576 |             13 |                 |            |                |                 |                    | 388 Holiday Street |                    |
|       181 | 06-NOV-19 07.49.00.000000000 AM |                576 |             13 |                 |            |                |                 |                    | 388 Holiday Street | Brian              |
|       190 | 06-NOV-19 10.57.00.000000000 AM |                576 |             13 |                 |            |                |                 |                 12 | 388 Holiday Street | Brian              |
|       195 | 06-NOV-19 11.42.00.000000000 AM |                576 |             13 |                 |            |              8 |                 |                 12 | 388 Holiday Street | Brian              |
+-----------+---------------------------------+--------------------+----------------+-----------------+------------+----------------+-----------------+--------------------+--------------------+--------------------+
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Matthew McPeak
  • 17,705
  • 2
  • 27
  • 59
  • 1
    This looks great. Your code, more or less, is the same as what I've been working on this morning - except I've been using regular aggregates, becuase I'm terrible with analytics. (Thanks for giving me a good analytic example!) I'm immediately looking into Flashback Archives, which will definitely be better than my current garbage code. Hopefully I can figure out how to implement Flashback Archives into Apex ... although my understanding is Apex *SHOULD* support all Oracle DB features? I'm going to find out ASAP. – Giffyguy Dec 02 '19 at 19:03
  • 1
    @Giffyguy I don't do much with APEX, but this link seems to indicate there is built-in support for flashback queries: https://docs.oracle.com/database/apex-5.1/AEEUG/executing-flashback-query.htm#AEEUG-GUID-E58A8DC7-555D-491D-A138-97B3FCFFBC5F. – Matthew McPeak Dec 02 '19 at 19:21
2

My proposal would actually to be to change the LOG table entirely, instead of just logging what column you change and what you changed there. Every time a row is updated, you insert the old row into the LOG table, along with a marker for INSERT, UPDATE, DELETE, a timestamp, and a log_id.

Then when you want to know the state of the table at a certain time, you simply do a query(or construct a simple view to simplify this even further) and you select max timestamp before the desired date, for distinct vending machines. Basically, select the most recent log entry for that vending machine, that is still before your desired date(if the most recent entry is delete, then dont show it).

This way of doing things would simplify things substantially, it would take up slightly more space(but space is cheap nowadays), and your trigger for updates would probably take a slight performance bump. Not to mention that this would also perfectly handle the inserted and deleted rows issue. But the view you make on this table should be very quick, and I would bet it would be a lot faster than anything you cobble together with this current log table.

If you have to use the current log table though, I'm not sure a VIEW would cut it. I think what you would need is to make another temporary table identical to your existing VENDING_MACHINES table, and then when you enter a date for which you want the data run some PLSQL.

Then we are met with a problem because your LOG table is recording new values and not old values.

So what I would do is run a PLSQL procedure, select all distinct changes after your desired date(if one machine updated snickers count 13 times, only take one of those) in order to find everything that has been changed since your desired date. Then go find the last time that column was updated or inserted before your desired date and get the value from there. This will require some dynamic SQL magics and its gonna be a pain to code and a pain to run.

So if you can't do the whole table change I proposed, but can still change the trigger, insert the OLD values into the LOG table, the new record is stored in the VENDING_MACHINES table anyways. In which case you would probably still need to create a copy of the VENDING_MACHINES table, but this time the PLSQL procedure would be much simpler because you simply loop through all the logs after your date, from most recent to oldest, and for each change, you do a simple dynamic SQL to reverse it.

I would strongly suggest you do the first method with changing how your LOG table is formed. Because that will be a whole lot simpler, easier to implement, and a lot faster to run.

EDIT: Thought of another way you could go around things. First you would set up a view to change how the LOG table shows, so that it is of the same form as the VENDING_MACHINES table, has the same columns,.. This would be quite simple and would look something like this:

SELECT change_id, change_timestamp, vending_machine_id,
       CASE WHEN modified_column_name = 'SNICKERS' THEN new_number_value ELSE NULL END AS snickers, 
      CASE WHEN modified_column_name = 'MILKY_WAY' THEN new_number_value ELSE NULL END AS 'milky_way',
.....


    CASE WHEN modified_column_name = 'FACILITIES_ADDRESS' then new_text_value ELSE NULL END AS 'facilities address'
  FROM log

Then you set up another view on top of this that actually gets you your desired date. The new view is structured like the original VENDING_MACHINES table, with distinct vending_machine_ids, but for each column you select the value in this column from the view where the timestamp is the most recent and the value is not null(select the most recent change to that column), you will need to somehow figure out a special case for when the change to the column actually was to set it to NULL, in this case you could have the first view include a NVL when the column is changed and if it was changed to null you set a value that would never be set normaly, and then in the second view you check for that value and translate it back to null.

If you want how the row looked at every change, you could just structure the view in such a manner that for every change, it runs the same select on each column as above.

This solution is less efficient than my original one with the changing of the log table, but a whole lot better than the other ones I thought of. This one actually fits your needs quite perfectly. If you like my idea but want any clarifications let me know.

TineO
  • 1,023
  • 8
  • 24
  • 1
    Thanks for this! It looks very promising. I'm going to start working on this immediately. I'll let you know how it goes, or if I have any questions. – Giffyguy Dec 02 '19 at 14:54
1

While of course this can be done, this cannot be done efficiently using standard SQL because your tables violate the basic rules of a relational database. Specifically, you have a column in one table referenced by the text representation of its name in another. This is a critical relation that is not captured in the relation metadata.

So given that it cannot be done efficiently, the question is what level of inefficiency can you tolerate and what trade-offs do you want to make? Usually these kinds of change logs are written because it was considered that the full historical table would be too big, but also it is often the case that that decision was made a long time ago, and now that we are firmly in the era of "big data" what use to be "too big" is now "no problem".

  • If you the size of the resulting table is tolerable, I would favor creating a fully materialized table containing the information you are interested in.
  • If the full table is too big, can you make it a suitable size by reducing the granularity of the timestamp? Have one row per day or per week rather than one for every update.
  • Alternately, can you reduce the size by limiting the time range, say to only the past 90 days?

If any of those options are suitable, then you can run a data warehousing process (essentially an ETL process) at some regular interval (e.g. nightly) to create and update the table. Note that the resulting table does not need to be in the same database if you do not need to do joins on it. You can also modify the the trigger or create a new trigger to keep the expanded table up to date after you create it manually.

Otherwise going to have a very difficult time doing this dynamically, because you will have to be explicit in the SQL about mapping values to columns for each column.

Old Pro
  • 24,624
  • 7
  • 58
  • 106
  • Understood, yeah the explicit mapping is probably required, and it's fine. We're OK with explicit mapping, since we won't have to revisit this code very often (if ever). To answer your questions further, we offered data warehousing as a solution, and the boss vetoed it. He said we have to do it this way, so it's what we're stuck with. – Giffyguy Dec 02 '19 at 00:14
  • 1
    @Giffyguy what exactly did the boss veto? I think replacing the log table with the fully materialized table you want to query is the way to go if the resulting table is a manageable size. That's a 1-time conversion plus a relatively easy change to the trigger that updates the table. – Old Pro Dec 02 '19 at 20:28
  • I absolutely agree. But they've had that solution in the past, and it happened to be very buggy and broken at the time, so they're biased against it (unreasonably). I just got out of a meeting with the boss, and I told him about Oracle Flashbacks, and he was much more interested in that (I am too). I'll be putting in some requests with our DBAs to see if we can enable Flashbacks, and if so, we'll try to use that instead. – Giffyguy Dec 02 '19 at 22:15
1

There are a few options; none of them are particularly pleasant, especially if you've already got a lot of data.

In my opinion, the cleanest way of doing this is to accept that time plays a key role in your business domain, and bake it into your schema design without relying on logs. This is the academic basis for my recommendation - see also this answer on Stack Overflow. In your case, I'd add 3 columns to VENDING_MACHINES:

status int not null
valid_from datetime not null
valid_until datatime null

Status tracks whether a machine is active or "deleted". You never delete records, you merely set their status to "deleted". valid_from marks the moment from which this record is valid; valid_until marks the moment when the record is overwritten. When something changes for a vending machine, you set the valid_until for that vending machine to getdate(), and insert a new record with valid_from as getdate(). This allows you to see the status of the machine at any point in time; the current status is reflected by all rows where valid_until is null. You no longer need the log table. The drawback of this method is that you may have a fair bit of data access code to re-write, and all your joins need to include the time logic; this is good if you want to reflect time in your business logic (e.g. "what was the value of unsold Snickers bars as per 1 January" require you to know both how many snickers bars there were at that time, and what the price of Snickers was on that date). If that really is too much hassle, you can create a view with valid_until is null and status = 1.

The next option would be to modify your trigger to accommodate this logic. I'm not a huge fan of triggers with lots of business logic as the performance impact can be unpredictable.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52