-1

I have to design a wide table for a database (timescaleDB, which will create hypertables based on date), but it seems like there are no possible primary keys, even if we are talking about composite keys.

| id | attribute1 | attribute2 | attribute3 | attribute4 |      date_time
| ---| ---------- | ---------- | ---------- | ---------- | -------------------
| P1 |     A      |     20     |    NULL    |    NULL    | 2021-01-01 00:00:00
| P1 |     B      |     10     |    NULL    |    NULL    | 2021-01-01 00:00:00
| P1 |    NULL    |    NULL    |    200     |    300     | 2021-01-01 00:00:00
| P2 |     C      |     25     |    NULL    |    NULL    | 2021-01-01 00:00:00
| P2 |    NULL    |    NULL    |    150     |    400     | 2021-01-01 00:00:00

The problem is that we are scraping data that is describing P1, P2, etc. as a whole, and also that is describing only a part of P1 (A and B are part of P1) P2 (C), etc...

Is there any way to make this work without splitting up the table?

  • 2
    What about adding an auto-increment column that will be the unique key, or a column drived by a sequence? – Marc Le Bihan May 07 '21 at 14:09
  • Strikes me that `NULL` in `attribute1` ***implicitly*** means something? Or could `P1` have multiple rows (for the same datetime) where `attribute1` is `NULL`? *(If `NULL` implicitly means something, use an explicit value instead, then use that column in your key)* – MatBailie May 07 '21 at 14:15
  • 1
    Also, why are you opposed to having two tables? One for when `attribute1` is `NULL`, and one for when it's not? Some columns always being `NULL` when `attribute1` is `NULL` *(and vice versa)*, is an indication that you're ramming two tables together in to one, against normal database design practice... – MatBailie May 07 '21 at 14:21
  • Attribute1 and Attribute2 will always be NULL for P1, those are describing A and B (and they belong to P1). Similarly, attribute3 and attribute4 are going to be always NULL if there is A, B, C, etc. because those attirubtes are describing P1. Personally I would also go with separate tables, it's just that there are some people around who are hard to convice :) – csicskagyasz May 07 '21 at 14:29
  • I am not an expert (at all) in timescaleDB, but wouldn't you create a unique compossed index acting as PK? It accepts NULL values – James May 07 '21 at 14:31
  • 1
    Does this answer your question? [What to do with null values when modeling and normalizing?](https://stackoverflow.com/questions/40733477/what-to-do-with-null-values-when-modeling-and-normalizing) – philipxy May 07 '21 at 19:38
  • @JaimeDrq For a UNIQUE NULL column set can be multiple rows in a table with all the same contents, so a subrow doesn't even uniquely identify itself. Also a FK can be declared to REFERENCE a UNIQUE NULL column set, but FK column set subrows with a null don't reference anything, they satisfy the FK constraint no matter what's in the referenced table. – philipxy May 08 '21 at 00:04
  • This particular use of null is a faq, DB subtyping/inheritance/polymorphism. [How can you represent inheritance in a database?](https://stackoverflow.com/q/3579079/3404097) PS Before considering posting please read the manual & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. If you post a question, use one phrasing as title. Reflect your research. See [ask] & the voting arrow mouseover texts. – philipxy May 08 '21 at 00:06

2 Answers2

0

You can follow the design below. The following structure does not store any null values in the database

create table parenttable
(
  id int identity,
  Name nvarchar(10),
  primary key(id)
)

create table childtable
(
  id int identity,
  parent_id int,
  attribute nvarchar(50),
  valueattribute nvarchar(50),
  date_time datetime,
  primary key(id),
  foreign key(parent_id)references parenttable
);



insert into parenttable values
('P1'),
('P2')

insert into childtable values
(1,'attribute1','A','2021-01-01 00:00:00'),
(1,'attribute2','20','2021-01-01 00:00:00'),
(1,'attribute1','B','2021-01-01 00:00:00'),
(1,'attribute2','10','2021-01-01 00:00:00'),
(1,'attribute3','200','2021-01-01 00:00:00'),
(1,'attribute4','300','2021-01-01 00:00:00'),
(2,'attribute1','C','2021-01-01 00:00:00'),
(2,'attribute2','25','2021-01-01 00:00:00'),
(2,'attribute3','150','2021-01-01 00:00:00'),
(2,'attribute4','400','2021-01-01 00:00:00')

select *
from parenttable p join childtable c on p.id = c.parent_id

result in dbfiddle: https://dbfiddle.uk

Meysam Asadi
  • 6,438
  • 3
  • 7
  • 17
0

Attribute1 and Attribute2 will always be NULL for P1, those are describing A and B (and they belong to P1). Similarly, attribute3 and attribute4 are going to be always NULL if there is A, B, C, etc. because those attirubtes are describing P1.

There is not enough information in your problem statement to answer your question.

I don't understand the above description, but it's enough to tell me you need to apply functional dependency analysis, and create as many tables as "those are describing" exist.

attribute3 and attribute4 ... are describing P1

That suggests you should have a table representing P1 things, with attribute3 and attribute4 as columns (preferably with meaningful names).

Organize your tables around the things you're modeling.

  • Look for columns that cannot be NULL for particular things. Those belong in the table depicting one kind of thing.
  • Then look for columns that might be NULL for a certain kind of thing. Those can be NULL-able columns, or a separate table sharing the same key, with optional cardinality.
  • There are no other kinds of columns.

Once you've grouped your column into tables and distinguished what's necessary from what's not, you can look over the mandatory columns for a candidate key. There is always such a key, even if it includes all the non-NULL columns. Why? Because two identical rows are indistinguishable from each other. If you think you need two such rows, what you really need is 1 row, and a quantity column (not in the key) indicating how many such exist.

James K. Lowden
  • 7,574
  • 1
  • 16
  • 31