0

I have the following sql table called datatable,

date       yesteday today tomorrow reading source
2021-01-01 x        x     x        x       1
2021-01-01 x        x     x        x       2
2021-01-01 x        x     x        x       3
...
2021-01-02 x        x     x        x       1
2021-01-02 x        x     x        x       2
2021-01-02 x        x     x        x       3
...
2021-05-31 x        x     x        x       1
2021-05-31 x        x     x        x       2
2021-05-31 x        x     x        x       3

When I created the table, I set the following,

create table datatable 
(
    date date, 
    yesterday real, 
    today real, 
    tomorrow real, 
    reading real, 
    source varchar
)

Then I created an index:

create index datatable_idx on datatable (date, source)

Every day, the source numbers will repeat themselves. So date together with source would be unique.

I want to improve my data collection and storing process. I was told that I should create a primary key.

I'm guessing the command to use here would be

ALTER TABLE datatable ADD PRIMARY KEY (date, source)

My question is why should I do this and what is the difference between this and the index I created. Will it affect my process?

Also next time I create a table, do I have to create an index and primary like this or is there a way to do both upon creation?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
anarchy
  • 3,709
  • 2
  • 16
  • 48
  • It's a confusing question. In a well designed relational database, all data in a table should be identified by and related to its primary key. It's hard to imagine designing a table without first deciding what its primary key is and how the other columns in the table are related to it. – scottb Jun 01 '21 at 16:28
  • because I dont have any other relationships, just this table, but it has more than 100k rows – anarchy Jun 01 '21 at 16:29
  • You can't have a table without having a primary key. The row has to be identifiable and retrievable in some way. It seems that in this table, there already is a primary key, which is the composite key (date, source). – scottb Jun 01 '21 at 16:32
  • i tried to do ALTER TABLE datatable ADD PRIMARY KEY (date, source), but i got this error DETAIL: Key (source, date)=(t, 2021-04-02) is duplicated. – anarchy Jun 01 '21 at 16:35

1 Answers1

1

A primary key is a constraint that specifies that the values in its column(s) must be (mutually) unique and not null. This is so that it can be guaranteed to uniquely identify every row in the table, which is most useful when wanting to create foreign keys that refer to this table. It's still worth having if you don't have other tables, as it will prevent your table getting into a bad state where, for example, you have more than one of the same source value on a particular date.

The primary key field(s) will almost always have an index on them as well as they are used often for lookups and JOINs, but the two concepts are separate.

Some DBMSs (e.g. MySQL, SQL Server) automatically create a clustered index on the primary key, meaning that data in the table is sorted on disk by the field(s) comprising the primary key to make the common operations above even faster. However, postgres does not do this by default.

You can specify the primary key on table creation using the following syntax:

create table datatable 
(
    date date, 
    yesterday real, 
    today real, 
    tomorrow real, 
    reading real, 
    source varchar,
    PRIMARY KEY (source, date)
)

You can also add a new, auto-incrementing integer field to act as your primary key (generally referred to as a surrogate key or artificial key). You might want to do this if you don't have any other good candidates in your data items, but it can have other benefits as well (potentially faster JOINs for example).

create table datatable 
(
    id serial primary key,
    date date, 
    yesterday real, 
    today real, 
    tomorrow real, 
    reading real, 
    source varchar
)
Rob Streeting
  • 1,675
  • 3
  • 16
  • 27
  • any idea how i can create my primary key for this table ? i tried to do ALTER TABLE datatable ADD PRIMARY KEY (date, source), but i got this error DETAIL: Key (source, date)=(t, 2021-04-02) is duplicated., there are multiple duplicates of date everyday, and the source repeats itself. im not sure how to do it – anarchy Jun 01 '21 at 16:39
  • That error means that there is already duplicate data for the two fields you want as your primary key - the combination "t" and "2021-04-02" appears in more than one row. That's not allowed because as I mentioned in the answer, primary key means unique and not null. You'll need to either delete one of them or rethink which fields should be in your key - perhaps there's an additional field that's needed to make each row unique. – Rob Streeting Jun 01 '21 at 16:40
  • how can i check? there shouldnt be i didnt make any, im using postgresql. I made a copy of my original table, then i did \d temptable. I dont see any index. Yet i get this error – anarchy Jun 01 '21 at 16:43
  • try running the query `SELECT * FROM datatable WHERE source='t' AND date='2021-04-02'`. It should show the duplicate rows. – Rob Streeting Jun 01 '21 at 16:45
  • damn i see them. – anarchy Jun 01 '21 at 16:45
  • https://stackoverflow.com/questions/67793165/sql-drop-duplicate-values-based-on-certain-columns-only-keep-newer-value-based could you please help me with this if you know how – anarchy Jun 01 '21 at 17:07
  • 1
    There is a lot of good in this answer. You should remove the mention that "most DBMS" do something that (as far as I know) only MySQL does; that is misleading. Also, you should suggest an automatically generated primary key if there is no obvious candidate. – Laurenz Albe Jun 02 '21 at 03:06
  • SQL Server also auto-creates a clustered index FYI: "Creating a primary key automatically creates a corresponding unique clustered index, or a nonclustered index if specified as such." https://learn.microsoft.com/en-us/sql/relational-databases/tables/create-primary-keys?view=sql-server-ver15. I implemented your suggestions, thanks. – Rob Streeting Jun 02 '21 at 09:48