SQL/Power BI How to expand table according dates

Question

I have a table like below, where a new record is created when there is a change in the status of a task.

task	status	last update
A	1	28/04/2022
A	3	01/05/2022
A	5	05/05/2022
B	1	28/04/2022
B	3	03/05/2022
B	4	05/05/2022

The problem is that I need to plot a graph within a time range, where I know the status of each item regardless of the date it was changed/created. With that, I think the easiest is to transform to the table below:

task	status	last update
A	1	28/04/2022
A	1	29/04/2022
A	1	28/04/2022
A	1	29/04/2022
A	1	30/04/2022
A	3	01/05/2022
A	3	02/05/2022
A	3	03/05/2022
A	3	04/05/2022
A	5	05/05/2022
B	1	28/04/2022
B	1	29/04/2022
B	1	30/04/2022
B	1	01/05/2022
B	1	02/05/2022
B	3	03/05/2022
B	3	04/05/2022
B	4	05/05/2022

However, I can't think of a way to do it, either directly in Power BI or even in SQL, since I'm connecting to a redshift database through a sql query. Could you please help me? Thanks

score 0 · Answer 1 · answered May 10 '22 at 15:52

0

You can create the below visual using the standard line chart visualization. In the visualization settings, go to the "Shapes" menu and turn the "Stepped" view on.

While not necessary, it may be best practice to create a date dimension table with daily values spanning from the minimum update date to the maximum update date.

Dates = CALENDAR(MIN(Tasks[last update]),MAX(Tasks[last update]))

You can then create a one to many relationship between Dates and Tasks.

answered May 10 '22 at 15:52

Strictly Funk

323
1
8

doesn't work, because I need the data day by day to build a burndown chart e some rolling sum and in your example, in the last date I need to have 2 task with differents status (the numbers is approved, completed) not sum both tasks – haraujo May 10 '22 at 16:17

score 0 · Answer 2 · answered May 11 '22 at 05:13

demo
very similar question: How to do forward fill as a PL/PGSQL function
I don't know the actual differences between amazon redshift and postgresql. The demo is based on postgresql 14. It may not works on redshift.

Basic idea:for every distinct task, get the max, min last_updated date then use the generate_series function to expand the date based on task, task's min & max last_update. key point is first_value(status), because the once you expand the date, then obviously some date the status value is null, then use partition to fill the gap. If you want deep more, you can read manual: https://www.postgresql.org/docs/14/plpgsql.html

CREATE OR REPLACE FUNCTION test_expand ()
    RETURNS TABLE (
        _date1 date,
        _first_ctask text,
        _first_cstatus bigint
    )
    AS $$
DECLARE
    distinct_task record;
    max_last_update date;
    min_last_update date;
    _sql text;
BEGIN
    FOR distinct_task IN SELECT DISTINCT
        task
    FROM
        test_1
    ORDER BY
        1 LOOP
            min_last_update := (
                SELECT
                    min(last_update)
                FROM
                    test_1
                WHERE
                    task = distinct_task.task
                LIMIT 1);
            max_last_update := (
                SELECT
                    max(last_update)
                FROM
                    test_1
                WHERE
                    task = distinct_task.task
                LIMIT 1);
            _sql := format($dml$ WITH cte AS (
                    SELECT
                        date1::date, $task$ % s$task $ AS _task, status, count(status) OVER (ORDER BY date1) AS c_s FROM (
                            SELECT
                                generate_series($a$ % s$a $::date, $b$ % s$b $::date, interval '1 day')) g (date1)
                        LEFT JOIN test_1 ON date1 = last_update)
                SELECT
                    date1, _task, first_value(status) OVER (PARTITION BY c_s ORDER BY date1, status)
                    FROM cte $dml$, distinct_task.task, min_last_update, max_last_update);
            RETURN query EXECUTE _sql;
        END LOOP;
    RETURN;
END;
$$
LANGUAGE plpgsql;

the problem here is because the max date must be to every task, so we have all the tasks until the end date. Is it possible? — haraujo, May 11 '22 at 11:21
The query make the min date, max date dynamic, grouped by task. if you min date, max date is fixed the whole query would be more easier. Please check the db fiddle link! — jian, May 11 '22 at 11:27

SQL/Power BI How to expand table according dates

2 Answers2