Modeling optional properties of an Entity, based on a mutable discriminator in a parent Entity

Question

I am having a hard time getting a relational model out of my project's requirements.

The general context is about devices posting sensor data.

In the table representations, the PKs of each table are above the dashed line and I also include some sample data. Any irrelevant column is omitted.

There are various types of sensors discriminated by the kind of data they read. We have 'Measure' sensors measuring a quantity as a NUMERIC value (e.x. temperature) and sensors reading a state/switch change as a BIT value(e.x. opening-closing of a door).

[SensorType]
########################    | SensorTypeCode | ReadingTypeCode |
# SensorTypeCode       #    ------------------------------------
# -------------------- #    | Temperature    | Measure         |     
# ReadingTypeCode      #    | Humidity       | Measure         |
########################    | Door           | Switch          |

There are various types of devices in respect to the types of sensors they contain. Only one of each type of sensor can be contained in a device. A device type is immutable regarding the sensor types it contains.

[DeviceType]
########################    | DeviceTypeCode |
# DeviceTypeCode       #    ------------------
# -------------------- #    | D_TYPE_1       |   
########################    | D_TYPE_2       |

[DeviceTypeSensorType]
########################    | DeviceTypeCode | SensorTypeCode |
# DeviceTypeCode       #    -----------------------------------
# SensorTypeCode       #    | D_TYPE_1       | Temperature    |
# ---------------------#    | D_TYPE_1       | Humidity       |
########################    | D_TYPE_2       | Temperature    |
                            | D_TYPE_2       | Door           |

A device has a unique identifier and a type. The type of a device can change, meaning that sensors can be added or removed.

[Device]
########################    | DeviceMacAddress  | DeviceTypeCode |
# DeviceMacAddress     #    -------------------------------------
# ---------------------#    | 00:00:00:00:00:00 | D_TYPE_1       |
# DeviceTypeCode       #    | FF:FF:FF:FF:FF:FF | D_TYPE_2       |
########################

The devices post readings that contain some common fields, independently of the device type and sensor types contained in the device.

[Reading]
########################
# DeviceMacAddress     #
# ReadingDtm           #
# ---------------------#
# BatteryLevel         #
# ...                  #
########################

Additionally, a device reading contains data related to the all of the device's sensors. That would be two child tables of Reading, ReadingMeasure and ReadingSwitch, storing sensor data discriminated by the ReadingTypeCode discriminator.

Here is where I start having some trouble finding a solution. As I said before, the device type can change, but a change should not affect any past readings that should still be available.

1st approach

Create a device history table storing any previous device types and add an UpdatedDtm field in Devices, following the modelling process described here.

[DeviceHistory]
########################   
# DeviceMacAddress     # 
# AuditedDtm           #
# ---------------------#
# DeviceTypeCode       #
# UpdatedDtm           #
########################

Structuring the Reading{Type} tables as:

[ReadingMeasure]               [ReadingSwitch]
########################      ########################
# DeviceMacAddress     #      # DeviceMacAddress     #
# ReadingDtm           #      # ReadingDtm           #
# DeviceTypeCode       #      # DeviceTypeCode       # 
# SensorTypeCode       #      # SensorTypeCode       #  
# ---------------------#      # ---------------------#
# Value (NUMERIC)      #      # Value (BIT)          #
########################      ########################

with

(DeviceMacAddress, ReadingDtm) FK referencing Reading
(DeviceType,SensorType) FK referencing DeviceTypeSensorType

That way I can just update Device.DeviceTypeCode and the (DeviceType,SensorType) FK will still have a valid reference to an immutable row in DeviceTypeSensorType. Any new readings will be inserted taking into account the new device type.

It seems to me that I lose a lot of referential integrity that way and I would like a better solution. All database operations will be done through stored procedures implementing the necessary checks anyway, so maybe this approach is acceptable ?

2nd approach

I addition to the tables of the previous approach, also create a Sensor table and two exclusive subtypes of Sensor, SensorMeasure and SensorSwitch

[Sensor]
########################   
# DeviceMacAddress     #    
# DeviceTypeCode       #
# SensorTypeCode       #  
# ---------------------#
# IsObsolete           #
########################

| DeviceMacAddress  | DeviceTypeCode | SensorTypeCode |  IsObsolete |
---------------------------------------------------------------------
| 00:00:00:00:00:00 | D_TYPE_1       | Temperature    | 0           |
| 00:00:00:00:00:00 | D_TYPE_1       | Humidity       | 0           |
| FF:FF:FF:FF:FF:FF | D_TYPE_1       | Temperature    | 0           |
| FF:FF:FF:FF:FF:FF | D_TYPE_1       | Door           | 0           |


[SensorMeasure]               [SensorSwitch]
########################      ########################
# DeviceMacAddress     #      # DeviceMacAddress     #
# DeviceTypeCode       #      # DeviceTypeCode       #
# SensorTypeCode       #      # SensorTypeCode       #
# ---------------------#      # ---------------------#
########################      ########################

In this case:

ReadingMeasure and ReadingSwitch will have an FK (DeviceMacAddress, DeviceTypeCode, SensorTypeCode) referencing SensorMeasure and SensorSwitch respectively.
Sensor will have an FK (DeviceTypeCode, SensorTypeCode) referencing DeviceTypeSensorType.
An insert to Device leads to the appropriate Sensor inserts, depending on the device type.
On device type change, i will mark any sensor removal or re-installation setting Sensor.IsObsolete

I think this approach provides more integrity but requires many redundant relations that seem unavoidable. All the data in the three new tables is already accessible through Device.DeviceTypeCode.

Any thoughts?

Please ask 1 specific researched non-duplicate question re the 1st place you are stuck following the design method reference you name & refer to. Otherwise you're asking us to rewrite it. Include just what is necessary. When distilled this is probably a faq. Before considering posting please read the manual & google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. PS https://stackoverflow.com/q/3579079/3404097 https://stackoverflow.com/q/190296/3404097 — philipxy, Apr 05 '20 at 22:55
@philipxy You posted me the same links in another related question that got closed for the same reason of being too broad. If you pay any attention I think you can clearly see that I already use the subtype pattern that is referred in the answers of those links. I don't see what I can remove while still containing all the relevant info, but I will try editing sometime in the day. — Giannis Tsim, Apr 06 '20 at 06:47
@philipxy Most probably I can express the question more clearly but I really don't understand how this can be distilled to just a FAQ, especially when I can't seem to find any helpful info even after extensively studying many questions in here. I will also include a text-based schema in the following edit. — Giannis Tsim, Apr 06 '20 at 06:47
You have no basis by which to suggest that I am not paying attention or paying no attention. You are jumping to (wrong) conclusions. PS Your post asks many questions. Ask 1. Like the 1st you come to adding each entity & relation(ship)/association from nothing. Then you can pin down a situtation specifically enough to characterize it in a way that you could successfully search for it. PS If a question is closed, edit it so it can be voted open. — philipxy, Apr 06 '20 at 07:35
@philipxy The links you posted me suggest otherwise. Anyway, I edited the question as best as I could. You can give it a try now and give me some feedback. — Giannis Tsim, Apr 06 '20 at 12:33

Neville Kuyt · Answer 1 · 2020-04-07T12:11:13.213

This is a pretty complicated question, so I'd like to start by restating the problem domain - please correct if I've misunderstood.

The system contains many devices
A device is uniquely identified by an immutable Mac address
A device belongs to exactly one device type at any point in time
A device may change device types over time
A device type contains one or more sensor types, and is immutable over time
A sensor type delivers a reading type, which is one of (switch, measure)
A device produces zero or more readings
A reading contains a "header" defining the reading (e.g. date/time)
A reading contains exactly one measurement for each sensor linked to the device, dependent on the device type that is active at the time of the reading.
A measurement belongs to exactly one reading.

If that is the case, I'd model the relationship of Device to Device Type as being time-dependent:

[Device]
| DeviceMacAddress  | DeviceTypeCode | ValidFrom            | ValidUntil
----------------------------------------------------------------
 00:00:00:00:00:00  | D_TYPE1        | 1 Jan 2020 00:00:01. | 23 Jan 2020 12:23:23
 00:00:00:00:00:00  | D_TYPE2        | 23 Jan 2020 12:23:23 | null

Every time the device type changes, you set the validUntil value for current row (identified by validuntil is null) to the current time, and insert a new row with the validfrom as the same timestamp, and the new device type code.

This allows you to track the change of the device type over time, and recreate it for any give moment in time.

ValidUntil is logically redundant - you could use only validFrom, and find the highest value smaller than the time you're interested in; this is slightly more complicated in big queries.

The primary key would be DeviceMacAddress, ValidFrom - the row is uniquely identified by that combination. If you include DeviceTypeCode in the primary key, you're logically saying that each row is uniquely identified by those 3 attributes, which suggests there can be a row for the same Mac and validfrom, but for a different devicetype, which is probably not correct.

The reading table is as you have outlined; when joining readings with devices, you use the timestamps:

where reading.timestamp between device.validfrom and device.validuntil

You would then have one row in each of the Reading<type> tables for each entry that is valid at the time of reading in the DeviceType table linked to the device. You don't need the column DeviceTypeCode in that table because you can derive that information by looking at the join from Device to DeviceType that was active at the time of taking the reading.

Your understanding is 100% on point. I guess the `ValidUntil` column is redundant since it is always the next `ValidFrom` datetime for a `DeviceMacAddress` and I can easily identify the current Device from it's last `ValidFrom`. Given that a `Device` can switch from and to a type multiple times, i assume you have a PK of (`DeviceMacAddress, DeviceTypeCode, ValidFrom`)? — Giannis Tsim, Apr 07 '20 at 10:55
I've updated the answer - yes, validuntil is redundant (just makes the queries a little easier to understand), but the primary key is just DeviceMacAddress, ValidFrom — Neville Kuyt, Apr 07 '20 at 11:06
I would prefer not to be forced to carry this `ValidFrom` to `Reading` , `Reading{Type}` and some more tables lower in the hierarchy, but maybe there is no alternative where a Device is only identified by the mac address while I still fullfill all requirements. I will give it some more thought and get back to you. — Giannis Tsim, Apr 07 '20 at 11:43
When you say I don't have to keep track of device type in reading, you mean I should remove the `DeviceTypeCode` from the `Reading{Type}` tables? — Giannis Tsim, Apr 07 '20 at 11:51
Yes, because you can find the devicetypecode from joining device to device type at the time the reading is recorded. — Neville Kuyt, Apr 07 '20 at 12:11

Damir Sudarevic · Answer 2 · 2020-04-07T21:12:01.970

This is quite verbose, but hope may help. Essentially device-configuration, and current-configuration tables are introduced. This way readings track (DEV_MAC, DEV_CFG_NO) pair which preserves historical data as devices change configuration. I have also renamed attributes somewhat in order to save on screen real estate.

[p 1] Sensor type SNS_TYP of reading type REA_TYP exists.

(c 1.1) Each sensor type is of exactly one reading type; it is possible that more than one sensor type is of the same reading type.

SensorType {SNS_TYP, REA_TYP}  -- p 1
        PK {SNS_TYP}           -- c 1.1

CHECK REA_TYP in (Measure, Switch)


-- sample
(Temperature, Measure)
(Humidity,    Measure)
(Door,        Switch)

[p 2] Device type of DEV_TYP exists.

(c 2.1) Device type is identified by DEV_TYP.

DeviceType {DEV_TYP} -- p 2
        PK {DEV_TYP} -- c 2.1

-- sample
(d_type_1)
(d_type_2)

[p 3] Device type DEV_TYP contains sensor type SNS_TYP.

(c 3.1) For each device type, that type may contain more than one sensor type; for each sensor type, that sensor type may be contained in more than on device type.

(c 3.2) For each device type and sensor type, that sensor type may be contained in that device type at most once.

(c 3.3) If device type contains sensor type, then that device type must exist.

(c 3.4) If device type contains sensor type, then that sensor type must exist.

DeviceTypeSensorType {DEV_TYP, SNS_TYP} -- p 3
                  PK {DEV_TYP, SNS_TYP} -- c 3.1, 3.2

FK1 {DEV_TYP} REFERENCES DeviceType -- c 3.3
FK2 (SNS_TYP} REFERENCES SensorType -- c 3.4


-- sample
(d_type_1, Temperature)
(d_type_1, Humidity)
(d_type_2, Temperature)
(d_type_2, Door)

[p 4] Device with MAC address DEV_MAC exists.

(c 4.1) Device is identified by MAC address.

Device {DEV_MAC} -- p 4
    PK {DEV_MAC} -- c 4.1


-- sample
(00:00:00:00:00:00)
(FF:FF:FF:FF:FF:FF)

[p 5] Configuration of device DEV_MAC changed to device type DEV_TYP, as device configuration number DEV_CFG_NO, on date of change VALID_FROM.

(c 5.1) Device configuration is identified by the device and the device configuration number.

(c 5.2) For each device and date of change, that device changed configuration on that date exactly once; it is possible that more than one device changed configuration on that date.

(c 5.3) For each device configuration and device type, that device configuration is of exactly one device type; it is possible that more than one device configuration is of that device type.

(c 5.4) If configuration of device changed, then that device must exist.

(c 5.5) If configuration of device changed to device type, then that device type must exist.

DeviceConfig {DEV_MAC, DEV_CFG_NO, DEV_TYP, VALID_FROM} -- p 5
          PK {DEV_MAC, DEV_CFG_NO} -- c 5.1, 5.3
          AK {DEV_MAC, VALID_FROM} -- c 5.2

FK1 {DEV_MAC} REFERENCES Device     -- c 5.4
FK2 {DEV_TYP} REFERENCES DeviceType -- c 5.5


-- sample
(00:00:00:00:00:00, 1, d_type_1, 2019-05-01)
(00:00:00:00:00:00, 2, d_type_2, 2019-07-01)
(00:00:00:00:00:00, 3, d_type_1, 2019-09-01)
(FF:FF:FF:FF:FF:FF, 1, d_type_2, 2020-01-01)
(FF:FF:FF:FF:FF:FF, 2, d_type_1, 2020-03-01)

Time to stop here and consider some assumptions. I do assume that new device configuration is known and entered into DeviceConfig before the device starts streaming data.

Device data may look something like this (JSON):

{ "device":"00:00:00:00:00:00",
  "time":"2019-07-01T17:15:12",
  "data":{ "Battery":75,
           "Temperature":35.5,
           "Humidity":1,
         }
}

This format provides:

{DEV_MAC, SNS_TYP, SNS_VALUE, READ_DTM}

Would like to have:

{DEV_MAC, DEV_CFG_NO, DEV_TYP, SNS_TYP, REA_TYP, SNS_VALUE, REA_TYP, READ_DTM}

In order to "enrich" sensor data, a current configuration view is created.

[P 6] Device DEV_MAC in current configuration DEV_CFG_NO, of device type DEV_TYP, contains sensor type SNS_TYP of reading type REA_TYP.

-- implement as a view
CurrentConfig {DEV_MAC, DEV_CFG_NO, DEV_TYP, SNS_TYP, REA_TYP}
          KEY {DEV_MAC, SNS_TYP} -- Logical


-- sample
(00:00:00:00:00:00, 3, d_type_1, Temperature, Measure)
(00:00:00:00:00:00, 3, d_type_1, Humidity, Measure)
(FF:FF:FF:FF:FF:FF, 2, d_type_1, Temperature, Measure)
(FF:FF:FF:FF:FF:FF, 2, d_type_1, Humidity, Measure)

-- sql server, postgreSQL
CREATE VIEW CurrentConfig AS
WITH
q_00 as (
        SELECT DEV_MAC
             , max(VALID_FROM) as LATEST
        FROM DeviceConfig
)
SELECT c.DEV_MAC
       c.DEV_CFG_NO
       c.DEV_TYP
       w.SNS_TYP
       s.REA_TYP
FROM DeviceConfig         AS c
JOIN q_00                 AS q  ON q.DEV_MAC = c.DEV_MAC
                               AND q.LATEST  = c.VALID_FROM
JOIN DeviceTypeSensorType AS w  ON w.DEV_TYP = c.DEV_TYP
JOIN SensorType           AS s  ON s.SNS_TYP = w.SNS_TYP

The application can now use this view to enrich streaming sensor data -- by matching (DEV_MAC, SNS_TYP) -- and create a set of tuples for each device reading.

{(DEV_MAC, DEV_CFG_NO, DEV_TYP, SNS_TYP, REA_TYP, SNS_VALUE, REA_TYP, READ_DTM)}

These tuples can now be inserted into reading tables.

[p 7] Device DEV_MAC in device configuration DEV_CFG_NO sent data at reading time READ_DTM; this reading was assigned number READING_ID.

Reading {READING_ID, DEV_MAC, DEV_CFG_NO, READ_DTM} -- p 7
     PK {READING_ID}
     AK {DEV_MAC, DEV_CFG_NO, READ_DTM}

FK1 {DEV_MAC, DEV_CFG_NO} REFERENCES DeviceConfig

Subtypes are now discriminated on reading type (REA_TYP). Added some extra control as type checks for SNS_TYP on subtypes.

[p 7.1] Measurement reading READING_ID of sensor type SNS_TYP is SNS_VALUE.

Reading_Meas {READING_ID, SNS_TYP, SNS_VALUE::Numeric} -- p 7.1
          PK {READING_ID, SNS_TYP}

FK {READING_ID} REFERENCES Reading

CHECK SNS_TYP in (Temperature, Humidity, Battery)

[p 7.2] Switch reading READING_ID of sensor type SNS_TYP is SNS_VALUE.

Reading_Sw {READING_ID, SNS_TYP, SNS_VALUE::Boolean}  -- p 7.2
        PK {READING_ID, SNS_TYP}

FK {READING_ID} REFERENCES Reading

CHECK SNS_TYP in (Door)

Note:

All attributes (columns) NOT NULL

[p x]   = predicate  x
(c x.y) = constraint x.y

PK = Primary Key
AK = Alternate Key   (Unique)
SK = Proper Superkey (Unique)
FK = Foreign Key

Modeling optional properties of an Entity, based on a mutable discriminator in a parent Entity

2 Answers2