0

I'm building a database for monitoring devices and I've reached an impasse.

I have 40 different types of devices, each of which records different information and have some common (e.g. device name, sample date) but mostly differing attributes (e.g. signal_50_100Hz, device_arm_attached).

As I see it, there are two options:

  1. Build a large table that includes and exhausts all attributes from all devices.

    • Cons: Not all devices will have values for all attributes, and not all attributes will be relevant for all devices. Much of the table may end up empty and I may have to build a table to record attribute names for each device in order to generate sensible output.
    • Pros: Having one table greatly simplifies things, and makes it easier to aggregate and roll up values later, less SQL to write, more generalized, and easier to understand.
  2. Create a separate table for each device type, listing each attribute separately in each table.

    • Cons: Need to write custom SQL every time I get a new device, rehash the entire system, no common interface. Large number of tables to build and manage independently.
    • Pros: Everything is laid out succinctly and can be decoupled to stand alone if need be. There is no extra baggage and one device type is not coupled to other device types via the generalized set of attributes.

The intent of this database is to continually aggregate information from these monitoring devices into a single view which then gets queried depending on the needs of the analyst.

I like the idea of creating a different table for each device, and then using a common view to aggregate the relevant data from those tables into a more usable and common form. This would allow me to retain the data (for retention sake), and aggregate it as needed from a common, abstracted layer, one large view.

Do you see any issues with this approach? Which approach makes more sense? What other strategies might I employ? Does anyone have any general advice on how to better approach this issue?

johnnyB
  • 1
  • 2

2 Answers2

0

In most cases, if you add a new device and it has new attributes and you intend for your application to use those attributes somehow, you are going to have to write code. There is no way to avoid that. Even if you have a flexible table structure that can hold anything, there is no point in holding onto something unless you do something with it, and the software that does something with it will need modification in order to do that something. So the idea of "decoupling" and never having to change anything again is just an abstract dream.

If you plan to hold on to data but don't have any plans to do anything with it at all (storage for storage sake, e.g. for audibility or compliance purposes), you could store it all in a flexible field (say a large varchar) as an XML or JSON structure. That would be completely flexible.

In either case you'd want to start with a single table. The table could include a unique primary key for each device, a name or serial number, model number, and then a series of columns that are common or useful across all devices. If an attribute is common but only exists on, say, 90% of your devices, it would be OK to create a column for the attribute and leave it null when it doesn't apply. If an attribute or set of attributes belongs to only a small subset of devices, I would consider breaking those into a separate table or tables, but with the same primary key. To help insulate your code from schema changes, you could build a view that would join the tables together in a useful way and use the view as the basis for stored procedures or select statements.

John Wu
  • 50,556
  • 8
  • 44
  • 80
  • Hi John, thanks for the input. The intent of this database is to continually aggregate information from these monitoring devices into a single view which then gets queried depending on the needs of the analyst. I like the idea of creating a different table for each device, and then using a common view to aggregate the relevant data from those tables into a more usable and common form. This would allow me to retain the data (for retention sake), and aggregate it as needed from a common, abstracted layer, one large view. Do you see any issues with this approach? – johnnyB May 03 '17 at 13:54
  • The problem with that approach is that you have to change the schema and the aggregation logic when you add a new device type. – NaturalData May 08 '17 at 14:46
0

This is the age-old issue of polymorphism in relational databases. There are a few patterns to consider.

One table with the union of all columns for all devices. Kind of sucks because there will be lots of null values for columns that don't relate, so it makes it impossible to add NOT NULL constraints that should apply to some devices. Also, you have to change the schema when adding a new device that requires new columns.

One table for each device and then union them together in a view for queries over all devices. Kind of sucks because you have to add a new table and modify the view for new devices.

The EAV pattern. Kind of sucks because it affects query performance and it's impossible to add NOT NULL constraints that should apply to some devices.

One table with a JSON column for the attributes that vary among devices. This is getting closer to a natural representation of your data, but it kind of sucks because you have to manage any indexes of the values. https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html#json-column-indirect-index

NaturalData
  • 459
  • 3
  • 8