Data granularity in relational database

Question

I am making a rest-api recieves a list of instructions that a specific thread on the server should execute under a 24h period, call it a daily schedule. The same instruction is executed during a time interval:

[
    {
        instructionName: string
        args : [    
            string
            ...         
        ]
        startHh : int
        startMm : int
        endHh : int
        endMm : int
    }   
    ...   
]

The content of args varies depending on instructionName.

The schedule should be persisted in MySql. Every x second the thread should asks the db for the current instruction and do some work.

My issue is that I am not sure what is the best option for storing the list of instructions in the database. As I see it I have two options:

Using the first approach, all I have to do is concat args to a single string and then parse the json to a DTO object directly and persist it, I have to be careful not to store instructionNames and args that the working thread later can't interpret. The worker thread can easily query the instruction table and get the current instruction with regards to the time interval.

In the second approach I have to first figure out the table using instructionName, see if the args are valid for that table and then insert it. The worker thread has no way of getting the current instruction in an simple way because the instructions are separated in different tables. When the worker thread figures out what table to query the thread can be sure that the args are properly formated since they are split into individual columns.

In my application there are going to be many types of instructions and new instruction types will be added continiousely during the lifetime of the application.

It seems that both approaches has big problems, i can't figure out the best approach for my particular use case. I am wondering if i even should use a relational data base at all for these types of data.

Any input is appreciated.

[How can you represent inheritance in a database?](https://stackoverflow.com/q/3579079/3404097) — philipxy, Nov 01 '18 at 01:51

Harald Coppoolse · Accepted Answer · 2018-11-01T13:58:02.547

In my application there are going to be many types of instructions and new instruction types will be added continuously during the lifetime of the application.

Which solution to choose depend a bit on your definitions of many and continuously. A database is extremely good in querying existing data. It is fairly good in changing stored data and adding new data. It is lousy in changing the database layout. So you should try to avoid changing the layout.

If changing continuously means several times a day, I wouldn't advice to make a table per application.

Even so if many applications means 1000s of applications / parameter configurations, then a table per application would lead to 1000s of tables, which is fairly undesired.

On the other hand, if you choose your first method, then, as you stated, you will have to take care of proper storage of the command and its parameters.

If you have some repository pattern which takes care of the use cases of your problem, then you could let the repository check the parameters before they are stored into your database.

void ScheduleBackupTask(TimeSpan startTime, TimeSpan stopTime, ... <backup parameters>)
{
     // check the parameter list, to see if they match the parameters of a backup task
     // and create the command
     var command = CreateBackupCommand(<backup parameters>);
     ScheduleCommand(startTime, stopTime, command);
}

void ScheduleCleaningTask(TimeSpan startTime, TimeSpan stopTime, <cleaning parameters>)
{
    // check the parameter list, to see if they match the parameters of a clean task
    // and create the command
    var command = CreateCleanCommand(<cleaning parameters>);
    ScheduleCommand(startTime, stopTime, command);
}

void ScheduleCommand(TimeSpan startTime, TimeSpan stopTime, Command command)
{
      using (var dbContext = new MyDbContext()
      {
           Schedule schedule = new Schedule(startTime, stopTime, command);
           dbContext.Schedules.Add(shedule);
           dbContext.SaveChanges();
      }
}

Every time you'll have to support a new command, or have to change the parameters of the command, you'll have to create or change the Create...Command function. There is only one place where you'll have to check the parameters.

Even if you would have chosen for your second solution, you would have needed a function that would have checked your parameters and put them in the correct order. So your second solution wouldn't have helped.

Executing the command

Obviously querying which commands must be executed is easier and faster using your first method. Once you've fetched the command, inclusive a commandType, it is easy to execute it:

IEnumerable<Command> commandsToExecute = FetchCommandsToExecute(TimeSpan time);
foreach (Command command in commandsToExecute)
{
     switch (command.CommandType)
     {
          case CommandType.Backup:
               ExecuteBackup(...);
               break;
          case CommandType.Clean:
               ExecuteClean(...);
               break;
     }
}

Obviously you will have to change the switch when supporting a new command. However, in your second solution you would also have to change the execution function.

Summarized: if you think of a lot of commands to support, regularly changing parameters or kind of commands to support, may advice would be to have one table containing all commands to support. Let your repository pattern check the parameters before adding / updating / executing

Thanks for the extensive answer that also include how to execute the instruction. A kind of instruction is not going to change once it has been intruduced, the format of the arguments will always stay the same for that instruction type. It is more about supporting new instruction types easily. But still, I am going to follow your advice. It seems best for my use case not to have the instruction types separated in different tables. — Tagor, Nov 01 '18 at 10:44

erik258 · Answer 2 · 2018-11-01T01:50:58.010

Any model functions when usage is low enough. But when usage is substantial, DDL such as "ALTER TABLE" to add new arguments becomes, if not prohibitively, then painfully expensive. Table schemas should be made to change as little as possible. Thus, I'd prefer your first option if I had to choose between the two.

Whether it's good enough, to me, depends upon whether you really want to query the arguments and potentially model highly dynamic data, or whether you simply need a place to shove some text and don't particularly care what's in it.

If, for example, you'd want to be able to answer questions such as "for which executions of a particular job was the fuelCount set to 3?". For that kind of question ,you'd need to be able to find fuelCount's existence and its value from an essentially unstructured text string. It'll require inelegant gymnastics to keep the analysis on the server, but pulling every row back to a mysql client to parse the arguments is similarly untenable with all but the smallest of data sets.

Another option would be to use the mysql json features if your database version supports it. This allows you to model the arguments however you want, without having to change the table format when novel values arise. You do however have to be clever enough that old queries don't break when models are changed. json support for mysql means being able to query the data in the json without having to retrieve, parse, and aggregate all the individual records in the database client, so it's a pretty neat feature. Postgres also has it.

For example, you could save arbitrary data about the command to be run within the runtime JSON column, and as long as you kept to some simple rules, you could have an optional argument for any arguments, as well as (for example) environment variables that might also be required for the program. With time, other runtime settings may arise that compel you to add more arguments to certain jobs. the JSON type is pretty great for this. If you want to query the JSON, you can. This allows you to impose some structure on the data ( all arguments will be within the args key of the top level dictionary, for example ) while not having to predefine every argument that might be passed.

It's well exemplified in the link above, if the idea strikes you as a good one. You aready seem to be thinking in something like json, so that might be an easy transition. This has the additional benefit of being very web friendly, as if you're building a REST API you're already probably planning to exchange JSON anyway.

If you're struck on an older version of mysql and can't get off of it, or are otherwise encumbered by your queries, I'd still suggest you stick to the first approach. If you want to take it further, you could add a table

CREATE TABLE args ( instruction_id int, argkey varchar, argval varchar)

and use eg GROUP_CONCAT to concatentate them together, if the maximum length of group_concat isn't a limiting factor. Otherwise, you could still concatenate them at runtime. This seems clunky to me, but it keeps the variable data in rows, and it does allow you to query the data on the server side.

Thanks for the extensive answer, i was just actually looking at the JSON data type for the args variable. It seems that it would be good for my use case because of the dynamic nature of the data. Using the first option also allows for a nice mapping between the database table and the JSON string recieved in the rest api. — Tagor, Nov 01 '18 at 01:48
oh yeah, it is also very web friendly. Should have mentioned that. Thanks — erik258, Nov 01 '18 at 01:50

Data granularity in relational database

2 Answers2

Executing the command