1

How would you design a database that has a table with 2 fields and just one of them can be set at a time, without too much redundancy? For example a file system:

Let's say that we have a table with drives, one with folders and one with files.

Drives and files are quite trivial. However, folders have a parent which can be either a folder (in which case the reference is to the same table) or a disk (in which case the reference is to a disk row).

Would you add some extra tables?

Andrew
  • 11
  • 2

2 Answers2

3

One table with id, name, parentid, and (if you want) type. Folders and files all point to their parent with parentid. Disks do not have a parentid. The only field that is NULL is the parentid field of the (relatively few) records that are of type Disk.

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
  • 1
    @Larry. That is "database design" ? – PerformanceDBA Nov 25 '10 at 10:58
  • @Snarky. Not only is it a "database design" but it has the advantage over yours that it is correct (folders and drives can be empty and therefore not easily distinguishable from files), it does not imagine (falsely) that filenames are unique across all folders (or, alternatively, it does not required the denormalization of storing the fully qualified path as a file name that would be required to make filenames unique and available as a candidate PK). – Larry Lustig Nov 25 '10 at 14:33
  • 1
    @Larry. Did not intend to be "snarky" whatever that means. You need to read the literature. If the column is a migrated element, as part of the key (parent PK is used to form child PK) it is not um "denormalisation". – PerformanceDBA Nov 25 '10 at 16:29
  • If you don't know what "snarky" means, how do you know that you didn't intend to be it? There are no google results for "sql theory migrated element" or "relational theory migrated element". If there were, however, I imagine they would refer to a dependent _column_ of information in a second table rather than using a single column to record two attributes, or copying the same attribute twice in the same row. Not to mention the other problems your "design" has. – Larry Lustig Nov 25 '10 at 16:48
  • 1
    @Larry: It sounds like a word a child would use, I am just guessing what it means. You expect to google for knowledge ? ROTFL. No use imagining, buy a textbook and read it. I will fix up my "design". The comments here were re your "design". – PerformanceDBA Nov 25 '10 at 16:53
  • 1
    I'm happy to receive polite corrections from people who have better solutions than mine. I'm even okay with a rude answer that teaches me something. But you were both rude and wrong, never a good combination. I see you've deleted your answer which at least serves the cause of not miseducating the original poster. And yes, I expect to find a reference to any computer subject on google. – Larry Lustig Nov 26 '10 at 04:41
  • 1
    @Larry: My way of providing a service to SO was to delete the Answer until I had corrected it (it was pretty late); this has now been done. If there is anything *specific* that is *wrong*, please point it out. SO suffers from pathetically shallow answers provided by people who obtain pathetically shallow info on the web; and who do not have actual education and experience. There is a reason why education and professionals costs money. Back to my initial comment: you have not "designed" the database, or answered the database-design question; three tables, because they have not been modelled. – PerformanceDBA Nov 26 '10 at 05:20
  • I like this design and it works! Drives are really just "top level folders" and they always looked like a wierd construct to anyone used to the cleaner *nix file systems. You dont really need the "type" column as you have two indicators that this is a drive - parent is null and =~ /^[A-Z][:]$/. – James Anderson Nov 26 '10 at 05:27
  • @James, you still need the type indicator to distinguish folders and files — and, once you're using it for that purpose I think it would make for cleaner application code to store the disk type indicator there as well (if, in fact, the application even wants to distinguish between disk roots and folders). – Larry Lustig Nov 26 '10 at 14:23
  • @Snarky: I responded with reference to the specific problems that your solution contained. SO suffers from a number of things one of which is rude people who consider themselves experts but who can not, in fact, produce working code or algorithms (programming as a profession suffers from this as well). You appear to be one of those people, a fault you compound by insulting people who _can_ solve problems. Ultimately, I'm not so concerned with your wrongness as your rudeness. – Larry Lustig Nov 26 '10 at 14:27
  • 1
    @Unprofessional: So you do not have any specific problem with my answer, just a generalised opinion about the weather. All those non-technical opinions re "normalisation" have been proved false. No wonder SO contains so much rubbsih from non-professionals. Note that SO encourages professional answers, specifics, not amateurs. – PerformanceDBA Nov 27 '10 at 00:31
  • I have described several specific problems with your answer (it's wrong and denormalized). You've acknowledged that your original answer was a "mistake" which you've tried (but failed) to correct in a second version. You persist in describing SO as "unprofessional" and "shallow" but you can't correctly solve even a simple programming task. Smugness is a characteristic barely tolerable in those who are entitled to display it, and that is a class from which you do not inherit. – Larry Lustig Nov 28 '10 at 13:58
  • 1
    @Unprofessional. 1) explain specifically how it it "denormalised". 2) the mistake was elsewhere, you have not noticed. 3) I didn't describe SO as unprofessional and shallow, I described the many posters who are technically clueless as such. 4) You have not proved my answer wrong, so (again) your opinions (technical or otherwise) are false. 5) Still waiting for you to substantiate your wild claims, until then, they are just that. – PerformanceDBA Nov 29 '10 at 04:38
1

Minor Error Corrected 26 Nov 10

I do not see why you need three files (they are not "tables" in a database until they are normalised together).

If the content you are actually dealing with drives; folders; files, then they are all files. This leads to the famous directory tree problem, which was solved in Unix I-Nodes, but remains a problem on any MS file system.

  • The highest level Nodes have FileNames of the form "X:" (a Drive)

    • They have no Parent Nodes

    • Use CHAR(0) instead of Nulls (that's a substitute, eliminating the problems of Nulls in the database, and therefore removing the performance issue; it is not solving the Null Problem, because Drives have no parent. You still need code).
      .

  • Not two or three separate and disparate files with massive data duplication and update anomalies (you did ask for a database-design answer, right ?). It is not trivial.

  • No blindly sticking Id-iot "key" columns on everything that moves.

  • If IsFolder, the Node is a branch in the directory tree, a Folder; othe wise it is a File.

  • Files can be promoted to Folders: set 'IsFolder' to true; after which then can be used as a ParentNode. If you have ISO/IEC/ANSI standard SQL, you can implement a CHECK constraint to ensure ParentNodes are IsFolder

  • Where the row is not IsFolder, a File, it cannot be used as a Folder or as a ParentNode.

  • Filenames must be unique within the Node (Folder), therefore an Unique Index is supplied. AK is Standard notation for Alternate Key meaning Unique Index.

Directory Data Model http://www.softwaregems.com.au/Documents/Student%20Resolutions/Andrew%20DM.pdf

▶Link to IDEF1X Notation◀ for readers who are unfamiliar with the Relational Modelling Standard.

Response to Comments

  1. Some commenters think that there is "duplication" in this model or that the FileName is concatenated. This is apparently due to not being able to read the model, which is provided in Standard notation. Or the inability to recognise highly Normalised structures.

    • there is no duplication of any kind

    • FileNames are atomic, eg CHAR(30), not the entire path or URL.

    • Atomic means the FileNames are not concatenated

    • FileNames are not duplicated within the node. If the same FileName is used in another Node, that is not a duplication, that is the reality, and that is allowed.

    • Highly Normalised structures are in fact, very small, and terse.

  2. I would ask anyone having difficulty with understanding this model to post a specific question, rather than to make statements about subjects they do not know.

  3. Anyone who thinks they have found a "problem" with this model is likewise requested to behave like a technical person, and post the specific error that they think they have found, rather than to post generalities and personal opinions.

PerformanceDBA
  • 32,198
  • 10
  • 64
  • 90
  • You cannot use filename as a PK, it is not unique across folders. To make it unique you would need to make the filename the fully qualified path which violates two rules of normalization — two attributes (parent and filename) stored in a single column and repetition of data (parent stored in the filename and parent fields). Furthermore, you're going to wind up with some very, very long key values. Finally, unless you plan on storing the symbolic "." and ".." links, folders and drives can be empty and therefore indistinguishable from files in your schema. – Larry Lustig Nov 25 '10 at 14:39
  • 1
    @Larry: Mistake re Filename uniqueness across folders has been corrected. No dots in the FileName. I have added `IsFolder` for people who cannot derive data. Your comments re "normalisation" are simply incorrect; the seeker may have a better understanding of it. Feel free to ask *specific* questions (as opposed to making generalised accusations). – PerformanceDBA Nov 26 '10 at 05:14
  • This is no better than your original attempt. You've simply pushed the problem from one field to the other. This "design" contains one of two flaws — either you're storing the entire path in the Parent field (in which case the data is badly denormalized) or you're storing only the folder / drive name there, in which case the values are not unique and they cannot possibly function as an identifying key back to the parent record. – Larry Lustig Nov 28 '10 at 13:55
  • 1
    @Larry. You have problems reading, and understanding, standard Data Models. Please click on the second link and familiarise yourself with the Standard. 1) I've already posted the FileName is **not** concatenated, it is atomic. 2) Er, the FK is `Parent**Node**`, there is no `ParentFileName`. 3) As discussed, as per Unix inodes, Node is an Integer. 4) There is a second Index as per my last para. 5) Feel free to ask specific questions (as opposed to posting ill-formed opinions), you will embarrass yourself less. – PerformanceDBA Nov 29 '10 at 04:31