-1

I want to create a simple database that have to store nodes, their functionalities and address to notify in case of problems. Initially I was thinking on using a simple SQL database:

CREATE TABLE IF NOT EXISTS push_notifications (
  node text NOT NULL,
  functionality text NOT NULL,
  address text NOT NULL,
);

where nodes and functionalities can have many address, N to N. This way to get the address I will just execute this two sentences:

SELECT address from push_notifications where node=XX and functionality=YY;
SELECT node, functionality from push_notifications where address=XX ORDER BY node, functionality;

However after reading a bit, I have several doubts on:

  • Is that ok for a database that initially won't have more than 10000 entries?
  • Should I use the normalize way of organizing tables, that is, one for nodes, another for functionalities and another for addresses and the use JOIN in SELECT? Then, how could I automatically delete the entries from the tables that are not longer linked, ie a node that has no functionality and no endpoint?
  • Should I use for instance a simple database engine like Redis, setting keys to node-functionality (string and value a list of addresses) and another set of keys to the endpoints (hash)?

I would like to add that I will be using Java to handle the access to data.

Thanks for you help. I would really appreciate and advise on what is the best way to do things like that.

Edit: option for select with multiple simple tables (I think is OK)

 CREATE TABLE IF NOT EXISTS node (
   id integer PRIMARY KEY AUTOINCREMENT,
   iri text NOT NULL,
   UNIQUE(iri) ON CONFLICT IGNORE -- ON CONFLICT REPLACE
 );

 CREATE TABLE IF NOT EXISTS functionality (
   id integer PRIMARY KEY AUTOINCREMENT,
   name text NOT NULL,
   UNIQUE(name) ON CONFLICT IGNORE -- ON CONFLICT REPLACE
 );

 CREATE TABLE IF NOT EXISTS address (
   id integer PRIMARY KEY AUTOINCREMENT,
   url text NOT NULL,
   UNIQUE(url) ON CONFLICT IGNORE -- ON CONFLICT REPLACE
 );

 CREATE TABLE IF NOT EXISTS push_info (
   node integer NOT NULL,
   functionality integer NOT NULL,
   address integer NOT NULL,
   UNIQUE(sensor, endpoint) ON CONFLICT IGNORE, -- ON CONFLICT REPLACE
   FOREIGN KEY(node) REFERENCES node(id) ON DELETE CASCADE
   FOREIGN KEY(functionality) REFERENCES functionality(id) ON DELETE CASCADE
   FOREIGN KEY(address) REFERENCES address(id) ON DELETE CASCADE
 );


 SELECT address.url as address
   FROM address
 INNER JOIN push_info
     ON address.id = push_info.address
 INNER JOIN node
     ON node.id = push_info.node
 INNER JOIN functionality
     ON functionality.id = push_info.functionality
 WHERE
   node.iri = "node1" AND
   functionality.name = "functionality1";
jlanza
  • 1,208
  • 3
  • 23
  • 43
  • 1
    Your data structure is not clear. Can any node be combined with any functionality and any address (and any address with any node and functionality and so on), but not all of them automatically exist? Then you will end up with a table like this. Although I wouldn't use `text` identifiers, you could either use `varchar` (and make sure you enter only valid values and treat them as code) or use 2-3 lookup tables with all functions and nodes (and maybe addresses) and use their id or code to enter in that table. – Solarflare Nov 09 '16 at 15:22
  • @Solarflare it is as you describe, multiple nodes can have same funct and address, its like an all all all. The lookup tables is the normalized option I'm suggesting where I have a list of nodes, a list of func, and a list of address. Then using one table I will be joining the ids of each. Is it very resource consuming? Or is it the better option. Or is it better to use the Redis approach? – jlanza Nov 09 '16 at 18:53
  • 1
    Your table is normalized as it is. Using 3 lookup tables has nothing to do with normalization, you only have to do it if you need e.g. a description or any other data in addition to your key (which can be a string, though I would use `varchar` then, and keep them short, preferable non-utf8), or if you want to validate the values with a foreign key. Otherwise it's up to your personal taste if you want to replace them with some integer ids. Same for redis or mysql. 10k rows will not make any difference performancewise, so it's personal taste. Just add indexes. – Solarflare Nov 09 '16 at 20:20
  • @Solarflare thanks a lot. I will use varchar. But you recommend to use the 3 lookup tables and the joined one, isn't it? Then, how can I delete unused references? In the one I include it's deleting one row, while in 3 lookup tables I have to check references. – jlanza Nov 09 '16 at 21:33
  • Your message does not explain what you are trying to do or ask. *Please address the very first comment on your question.* You are not explaining yourself clearly. Your text refers to things that you do not explain or relate to your tables. Please take the time to express what you are trying to achieve. We can't read your mind. (Eg what's an endpoint? Eg my answer has already explained that "nodes and functionalities can have many address, N to N" is not helpful. PS What's with unique (sensor, endpoint) when there is no such columns? – philipxy Nov 15 '16 at 01:05

1 Answers1

0

Normalization You have a table. Before you start "normalizing", or start talking about "normalizing", find out what normalization is. Here is the key question: Can you replace this table by smaller ones so that they always join to it? For that to be so, it must possible to phrase the criterion for a row to be in the table in a given application situation as "... AND ..." for one or more ANDs. If you can, normalization may suggest doing that replacement. If you can't, it won't. (Cardinalities do not in general allow us to determine that.)

Further tables Only you can know whether you want to store information that this table cannot tell you, so that you need more tables. Eg if you want to record that a particular node exists even though it does not participate in the application relationship that this table represents, then you need another table.

Relational vs not Relational databases allow generic declarative querying and integrity enforcement, with certain implementation costs (and some automated optimization). Other data structures each support specialized & largely non-declarative querying and integrity enforcement, with improved implementation costs, but with other cases typically awkward and/or expensive. You have to know your usage patterns and how costs and benefits trade off among these many dimensions in order to decide which design is "best". We can always reasonably start with a relational design then specialize where shown necessary and desireable.

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • My main problem is that I don't know if it really degradetes performance to have only one table, duplicating strings and search on it, or it will perform better to have simple tables and join table and search by where whaterver equals string. Have a look at the edited question. Which option is better? And simpler to manage? Considering both inserting, selecting, and deletion. – jlanza Nov 14 '16 at 18:24
  • Per my answer, when we normalize we "replace this table by smaller ones so that they always join to it". Replacing columns by ids is not normalization. If you're asking about replacing a table with string columns by one with id columns plus some tables mapping ids to strings then *say so*. (Ie make the effort to find words and sentences that say that. Like I just did). It clearly *complicates* the schema and queries. But per my answer, "best" [*depends*](http://stackoverflow.com/a/32151278/3404097). PS There's *nothing* about this in your question. I'm guessing from your code & this comment. – philipxy Nov 15 '16 at 01:42
  • If you want to "optimize" then you first need to learn about *design* and then learn about *multiple ways of querying for the same thing* then learn about *optimization in your particular DBMS*. Plus to compare to an alternative system then you have to learn *the same things about it*. – philipxy Nov 15 '16 at 03:41