0

There's a village where villagers visit multiple houses. I want to store which villager visits which house, and how often they visit it. I decided to create the table as follows;

CREATE TABLE 'visitis'('villager' TEXT NOT NULL, 'house' INTEGER NOT NULL, 'amount' INTEGER NOT NULL DEFAULT 0);

The table is stored as follows;

  • villager stores the villager's name. Each villager has a unique name.
  • house is the address of the house. This is an integer.
  • amount stores how many times that villager has visited that house.

This table does not contain a primary key, and is thus, what I consider, not optimal. Both the amount of villagers and the amount of houses is undefined to a level that it is no longer efficient to store the villagers or the houses as a column.

Multiple rows may contain the same villager name, and multiple rows may contain the same house address. The only known uniqueness, is that the combination of villager and house is unique. Hence I wondered if there was a method to use the two columns as a shared primary key. This could save a lot of time, but I'm not sure if SQLite is capable of doing this. If not, is there a better method to do this? I have the conditions that;

  1. Most villagers will visit multiple houses. However, this amount is relatively small and can be assumed to be below 50 houses;
  2. There are a lot of houses (integers) that will not be visited. However, the houses that are visited, will often be visited by up to 10 villagers;
  3. Villagers will visit a limited amount of houses, with but a few visiting over 1000 houses;
  4. It is important to store how often which villager has visited which house;
  5. Data needs to be accessed often by both the villager's name and the house's address number;
  6. Both the amount of houses and villagers can get very high.

Perhaps I'm missing something obliviously obvious. Am I...?

Randium
  • 224
  • 2
  • 10
  • Hi. The only CK (candidate key) here is {villager, house}. That's the only possible PK if you decide you want one. Time to read a textbook on information modeling, the relational model & database design. PS This is not clear: "Both the amount of villagers and the amount of houses is undefined to a level that it is no longer efficient to store the villagers or the houses as a column." Do you mean something re reducing string storage by the implementation? Do you know that you have described a table with a row for every possible villager-house pair? Are you trying to avoid that? – philipxy Sep 15 '18 at 13:28
  • This is a faq. Please always google many clear, concise & specific versions/phrasings of your question/problem/goal with & without your particular strings/names & read many answers. Add relevant keywords you discover to your searches. If you don't find an answer then post, using one variant search for your title & keywords for your tags. See the downvote arrow mouseover text. (Eg google your title with 'site:[so] sqlite'.) – philipxy Sep 15 '18 at 13:29
  • Possible duplicate of [Sqlite primary key on multiple columns](https://stackoverflow.com/questions/734689/sqlite-primary-key-on-multiple-columns) – philipxy Sep 15 '18 at 13:32

1 Answers1

2

The primary key of this table would seem to be the combination of the villager and house columns. SQLite has no problems with a primary key consisting of more than one column:

CREATE TABLE visitis (
    villager TEXT NOT NULL,
    house INTEGER NOT NULL,
    amount INTEGER NOT NULL DEFAULT 0,
    PRIMARY KEY (villager, house)
);

The logical reason for this is that a given villager visiting a given house needs to be represented by only a single record. The amount column handles the case where there could be more than one visit.

But while this fixes your immediate problem, it is not ideal, because your table is not normalized. The visitis table is what is usually referred to as a junction table, because it connects two other tables in some meaningful way. Here is a better design:

CREATE TABLE visitis (
    villager_id INTEGER NOT NULL,
    house_id INTEGER NOT NULL,
    amount INTEGER NOT NULL DEFAULT 0,
    PRIMARY KEY (villager_id, house_id),
    FOREIGN KEY (villager_id) REFERENCES villages (id),
    FOREIGN KEY (house_id) REFERENCES houses (id)
);

Now we only persist pairs of foreign keys, each of which points to a record in the villages or houses table. This way, we don't duplicate any data.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Normalization above 1NF replaces a table by projections of it. It.does not introduce new columns. Under common sense the question's table is in 5NF. It can't be normalized further. This "duplication" is an implemention-based notion that has nothing to do with normalization, involving ids not being "duplicated" even though other values in the same places would be "duplicated", and is for data compression & related performance. Ironically, reducing "duplication" increases redundancy. – philipxy Sep 15 '18 at 13:14
  • @philipxy The title question is whether SQLite supports a primary key consisting of multiple column. The answer is yes, and I have stated why this is the case. If you are suggesting that the OP should additionally store duplicated information, then you should explain why. – Tim Biegeleisen Sep 15 '18 at 15:41
  • That comment doesn't address my comment. Mine says that the alternate design is *not* from normalizing. (Which doesn't add ids.) (Or address insert/delete anomalies--but if that's part of the re-design, it unjustifiably assumes that the original design is wrong & that it doesn't store rows with 0.) PS You don't say what exactly is to be stored in the new tables but once we have separate tables for villagers & houses we would typically (presumably per non-"duplication"--whatever that is) not store villager-house pairs with 0 visits, whence 0 seems a problematic default. – philipxy Sep 16 '18 at 00:25