1

I have a sql table related to discontinuous dates: enter image description here

CREATE TABLE IF NOT EXISTS date_test1 ( items CHAR ( 8 ), trade_date date );
INSERT INTO `date_test1` VALUES ( 'a', '2020-03-20');
INSERT INTO `date_test1` VALUES ( 'b',  '2020-03-20');
INSERT INTO `date_test1` VALUES ('a', '2020-03-21');
INSERT INTO `date_test1` VALUES ( 'c', '2020-03-22');
INSERT INTO `date_test1` VALUES ( 'd',  '2020-03-22');
INSERT INTO `date_test1` VALUES ('a',  '2020-03-25');
INSERT INTO `date_test1` VALUES ( 'e',  '2020-03-26');

In this table, '2020-03-23' and '2020-03-24' are missed. I want to fill them by their previous data, in this table, the '2020-03-22' data. Expected result: enter image description here

The number of continues missing dates and of the records in one day are both uncertain. So how to do this in mysql?

Community
  • 1
  • 1
Quail Wwk
  • 25
  • 4

1 Answers1

0

This solution uses Python and assumes that there aren't so many rows that they cannot be read into memory. I do not warrant this code free from defects; use at your own risk. So I suggest you run this against a copy of your table or make a backup first.

This code uses the pymysql driver.

import pymysql
from datetime import date, timedelta
from itertools import groupby
import sys

conn = pymysql.connect(db='x', user='x', password='x', charset='utf8mb4', use_unicode=True)
cursor = conn.cursor()
# must be sorted by date:
cursor.execute('select items, trade_date from date_test1 order by trade_date, items')
rows = cursor.fetchall() # tuples: (datetime.date, str)
if len(rows) == 0:
    sys.exit(0)
groups = []
for k, g in groupby(rows, key=lambda row: row[1]):
    groups.append(list(g))
one_day = timedelta(days=1)
previous_group = groups.pop(0)
next_date = previous_group[0][1]
for group in groups:
    next_date = next_date + one_day
    while group[0][1] != next_date:
        # missing date
        for tuple in previous_group:
            cursor.execute('insert into date_test1(items, trade_date) values(%s, %s)', (tuple[0], next_date))
            print('inserting', tuple[0], next_date)
        conn.commit()
        next_date = next_date + one_day
    previous_group = group

Prints:

inserting c 2020-03-23
inserting d 2020-03-23
inserting c 2020-03-24
inserting d 2020-03-24

Discussion

With your sample data, after the rows are fetched, rows is:

(('a', datetime.date(2020, 3, 20)), ('b', datetime.date(2020, 3, 20)), ('a', datetime.date(2020, 3, 21)), ('c', datetime.date(2020, 3, 22)), ('d', datetime.date(2020, 3, 22)), ('a', datetime.date(2020, 3, 25)), ('e', datetime.date(2020, 3, 26)))

After the following is run:

groups = []
for k, g in groupby(rows, key=lambda row: row[1]):
    groups.append(list(g))

groups is:

[[('a', datetime.date(2020, 3, 20)), ('b', datetime.date(2020, 3, 20))], [('a', datetime.date(2020, 3, 21))], [('c', datetime.date(2020, 3, 22)), ('d', datetime.date(2020, 3, 22))], [('a', datetime.date(2020, 3, 25))], [('e', datetime.date(2020, 3, 26))]]

That is, all the tuples with the same date are grouped together in a list so it becomes to easier to detect missing dates.

Booboo
  • 38,656
  • 3
  • 37
  • 60
  • I hope my "use at your own risk" message doesn't unduly frighten you." This should go without saying for *any* code posted on SO that updates your assets. – Booboo Apr 05 '20 at 21:34