3

I'm helping a social scientist analyze the results of his research. He's recorded hundreds of interviews and other meetings and collected related documents. He has entered data on this work into four tables in MS Access:

+---------+-----------+-----------------------------------------------------+
|  Table  |    Key    |                  Example contents                   |
+---------+-----------+-----------------------------------------------------+
| cases   | ID_Case   | Description, case data                              |
| persons | ID_Person | Name, contact data                                  |
| events  | ID_Event  | Date, time, location of interview, meeting, etc.    |
| files   | ID_File   | Path, filename, date of recordings, documents, etc. |
+---------+-----------+-----------------------------------------------------+

We can call these the "base" tables.

To represent relationships among these tables, he first tried a set of six tables linked to the original four on their keys:

+--------------+---------------------+
|    Table     |         Key         |
+--------------+---------------------+
| CasePersons  | ID_Case, ID_Person  |
| CaseEvents   | ID_Case, ID_Event   |
| CaseFiles    | ID_Case, ID_File    |
| EventPersons | ID_Event, ID_Person |
| EventFiles   | ID_Event, ID_File   |
| FilePersons  | ID_File, ID_Person  |
+--------------+---------------------+

We can call these the "binary relationship" tables.

Example Scenario (Imaginary)

Here is an imaginary scenario to demonstrate an example of his data structure. The scenario involves three cases, seven people, three events, and five files.

Let's say he conducted two interviews with people about their love lives. One was an interview with four people: George, Sally, Henry, and Elliot. The second interview was with two people, John and Liz. The first interview was recorded on video and the second just in audio. A portion of the first interview was conducted and separately recorded with just Sally and Henry. Although Liz was at the second interview, she didn't speak and so is not in the recording. In the first interview, George shared a copy of a love letter to Sally, which was scanned into a pdf file. Finally, the third event was a phone meeting with a colleague, which was about technique, and no specific cases were discussed.

Data from this imaginary scenario are shown in the tables below. Next to the name of each table, I give the number of records in the actual table in the database.

Four Base Tables

Table [cases] (386 records)

+---------+---------------------------------+
| ID_Case |           Description           |
+---------+---------------------------------+
|       1 | A husband and wife are in love. |
|       2 | A man and woman are in love.    |
|       3 | A man has never been in love.   |
+---------+---------------------------------+

Table [persons] (1,472 records)

+-----------+-----------+----------+
| ID_Person | NameFirst | NameLast |
+-----------+-----------+----------+
|         1 | George    | Brown    |
|         2 | Sally     | White    |
|         3 | Henry     | Green    |
|         4 | John      | Baker    |
|         5 | Liz       | Jones    |
|         6 | Elliot    | Brooks   |
|         7 | Catherine | Drake    |
+-----------+-----------+----------+

Table [events] (526 records)

+----------+------------+---------------+
| ID_Event |    Date    |   Location    |
+----------+------------+---------------+
|        1 | 2016 06 01 | 123 Main St.  |
|        2 | 2016 07 02 | 456 Block Rd. |
|        3 | 2016 08 03 | Phone         |
+----------+------------+---------------+

Table [files] (1,748 records)

+---------+---------------------------+------+
| ID_File |         Filename          | Type |
+---------+---------------------------+------+
|       1 | Brown, Brooks interview   | avi  |
|       2 | White, Green subinterview | avi  |
|       3 | Brown letter              | pdf  |
|       4 | Baker interview           | wav  |
|       5 | Drake meeting             | wav  |
+---------+---------------------------+------+

Six Binary Relationship Tables

In the listings of binary relationship tables, I replace the keys for persons, events, and files with related text to make them easier to read. MS Access allows the same behavior by displaying a table field according to a query. Similarly, the roles shown in some of the tables are text displays of numeric foreign keys to separate tables of allowed roles.

Table [CasePersons] (720 records)

+---------+-----------+---------------+
| ID_Case | ID_Person |     Role      |
+---------+-----------+---------------+
|       1 | George    | Husband       |
|       1 | Sally     | Wife          |
|       1 | Henry     | Wife's friend |
|       2 | John      | Boyfriend     |
|       2 | Liz       | Girlfriend    |
|       3 | Elliot    | Individual    |
+---------+-----------+---------------+

Table [CaseEvents] (299 records)

+---------+------------+
| ID_Case |  ID_Event  |
+---------+------------+
|       1 | 2016 06 01 |
|       2 | 2016 07 02 |
|       3 | 2016 06 01 |
+---------+------------+

Table [CaseFiles] (301 records)

+---------+---------------------------+
| ID_Case |          ID_File          |
+---------+---------------------------+
|       1 | Brown, Brooks interview   |
|       1 | White, Green subinterview |
|       1 | Brown letter              |
|       2 | Baker interview           |
|       3 | Brown, Brooks interview   |
+---------+---------------------------+

Table [EventPersons] (700 records)

+------------+-----------+-------------+
|  ID_Event  | ID_Person |    Role     |
+------------+-----------+-------------+
| 2016 06 01 | George    | Interviewed |
| 2016 06 01 | Sally     | Interviewed |
| 2016 06 01 | Henry     | Interviewed |
| 2016 06 01 | Elliot    | Interviewed |
| 2016 07 02 | John      | Interviewed |
| 2016 07 02 | Liz       | Present     |
| 2016 08 03 | Catherine | Present     |
+------------+-----------+-------------+

Table [EventFiles] (1,490 records)

+------------+---------------------------+-----------+
|  ID_Event  |          ID_File          |   Role    |
+------------+---------------------------+-----------+
| 2016 06 01 | Brown, Brooks interview   | Recording |
| 2016 06 01 | White, Green subinterview | Recording |
| 2016 06 01 | Brown letter              | Received  |
| 2016 07 02 | Baker interview           | Recording |
| 2016 08 03 | Drake meeting             | Recording |
+------------+---------------------------+-----------+

Table [FilePersons] (2,392 records)

+---------------------------+-----------+-------------+
|          ID_File          | ID_Person |    Role     |
+---------------------------+-----------+-------------+
| Brown, Brooks interview   | George    | Interviewed |
| Brown, Brooks interview   | Sally     | Interviewed |
| Brown, Brooks interview   | Henry     | Interviewed |
| Brown, Brooks interview   | Elliot    | Interviewed |
| Brown letter              | George    | Writer      |
| Brown letter              | Sally     | Subject     |
| White, Green subinterview | Sally     | Interviewed |
| White, Green subinterview | Henry     | Interviewed |
| Baker interview           | John      | Interviewed |
| Drake meeting             | Catherine | Present     |
+---------------------------+-----------+-------------+

The problem is that there's a lot of redundancy in the data entered in all these tables. With hundreds of interviews, it's easy to make mistakes, either by entering some data wrong or forgetting to enter all the data in all six binary relationship tables. Once the data are entered, it's hard to check them for errors. Further, redundancies make it harder to find meaningful patterns in the data.

Single Four-Way Relationship Table

I've been trying to help him figure out how to represent these data in a more manageable fashion. One idea I've been exploring is to combine the six binary relationship tables into a single table with the four key fields: ID_Case, ID_Person, ID_Event, and ID_File. The above data then become:

Table [CasePersonEventFiles] (??? records)

+---------+-----------+------------+---------------------------+-----------------+---------------+-------------+-------------+
| ID_Case | ID_Person |  ID_Event  |          ID_File          |     CP_Role     |    EP_Role    |   EF_Role   |   FP_Role   |
+---------+-----------+------------+---------------------------+-----------------+---------------+-------------+-------------+
|       1 | George    | 2016 06 01 | Brown, Brooks interview   | Husband         | Interviewed   | Recording   | Interviewed |
|       1 | Sally     | 2016 06 01 | Brown, Brooks interview   | Wife            | Interviewed   | * Recording | Interviewed |
|       1 | Henry     | 2016 06 01 | Brown, Brooks interview   | Wife's friend   | Interviewed   | * Recording | Interviewed |
|       1 | Sally     | 2016 06 01 | White, Green subinterview | * Wife          | * Interviewed | Recording   | Interviewed |
|       1 | Henry     | 2016 06 01 | White, Green subinterview | * Wife's friend | * Interviewed | * Recording | Interviewed |
|       1 | George    | 2016 06 01 | Brown letter              | * Husband       | * Interviewed | Received    | Writer      |
|       1 | Sally     | 2016 06 01 | Brown letter              | * Wife          | * Interviewed | * Received  | Subject     |
|       2 | John      | 2016 07 02 | Baker interview           | Boyfriend       | Interviewed   | Recording   | Interviewed |
|       2 | Liz       | 2016 07 02 | 0                         | Girlfriend      | Present       |             |             |
|       3 | Elliot    | 2016 06 01 | Brown, Brooks interview   | Individual      | Interviewed   | * Recording | Interviewed |
|       0 | Catherine | 2016 08 03 | Drake meeting             |                 | Present       | Recording   | Present     |
+---------+-----------+------------+---------------------------+-----------------+---------------+-------------+-------------+

Obviously, this is much cleaner and more compact. The number of data rows have been reduced from 36 in six tables to eleven in one. Reduction in the number of data elements (non-null cells) is less dramatic, from 100 to 85. Related data are in close proximity that makes it easier to avoid errors, or to see any that exist.

In this table, zeros are used instead of nulls in order to allow the first four fields to form a primary key, ensuring uniqueness across those four fields. In the field names, "CP_Role", for example, means "Case-person role", i.e., the role of the person in the case.

Redundancy is reduced, but not eliminated. Thirteen data elements that are redundant with others are marked in this example with asterisks. Such redundancies are a pernicious invitation to data-entry error. However, it is fairly straightforward to automatically check for such errors in this table.

The big problem in implementing this four-way relationship table at this stage is in collapsing the data into it from the six binary tables. It's easy to create a table with all the necessary fields and then append all the rows of the six tables into it with zero values for absent keys. This makes a combined table of 5,902 records. If the numbers in the example scenario above can be extrapolated to the real data, then the number of records can be reduced to about 1,800, meaning over 4,000 records can be removed! I'm currently studying patterns in the combined table, seeking ways to automate the merging and removal of groups of superfluous rows. It's very slow-going.

Worse than the amount of time it's taking to make this new table is that I'm not fully confident that the result will be the optimal representation of the data.

The Question

Am I missing something important in this approach? Is there a more intelligent way to manage this collection of data? Does database theory provide a better way to represent these relationships?

philipxy
  • 14,867
  • 6
  • 39
  • 83
NewSites
  • 1,402
  • 2
  • 11
  • 26
  • Might be good question for https://dba.stackexchange.com/. Maybe. – wazz Oct 19 '17 at 21:17
  • I wondered about that. Is it okay to post it in both places? Or can I make a post there that links to the question here? What is the right thing to do when a question is appropriate for more than one StackExchange board? – NewSites Oct 19 '17 at 21:34
  • Not sure. Someone else plz? – wazz Oct 19 '17 at 21:59
  • Okay, I just read up on cross-posting (https://meta.stackexchange.com/questions/64068/is-cross-posting-a-question-on-multiple-stack-exchange-sites-permitted-if-the-qu). Answer: Not allowed. So if I don't get something helpful here in a few days, I think I'll follow one of Sherwood Botsford's suggestions there. In the meantime, I hope one of the database geniuses here will take interest. – NewSites Oct 19 '17 at 22:05
  • 2
    Time to read a textbook on info modeling & DB design. PS You are in some sense reducing certain redundancy, but also you are increasing certain problematic redundancy, which you addressed, though not as redundancy. You are "denormalizing". PS *Tables* represent application/business relation(ship)s/associations. "Relationships" [sic--though ubiquitous] between tables are constraints, ie truths, ie instances of the relationship "has subrows that must also appear in", ie FKs. (Each is associated with a binary relation(ship)/association that is expressible in terms of its referenced table's.) – philipxy Oct 21 '17 at 13:10
  • @philipxy - Thanks for saying the word for what I've been doing. From reading up on it, I see it's the opposite of what I want. Reduced number of rows gave me the ILLUSION of reduced redundancy, when actually denormalization increases it. I see now what I called "binary relationship" tables are the result of normalization, which is what I need. But normalization can still leave some redundancy. I need to go back to those six tables and nail down the problem with them, and then possibly post another question on that. In the meantime, an answer with the right approach to these data is welcome! – NewSites Oct 22 '17 at 01:31
  • 1
    There's no absolute benefit for any particular decomposition or (re)composition/denormalization. Normalizaiton says some combinations are fine, say nothing about others, and says others are problematic; you might nevertheless want a problematic one, while managing the problems. And you might benefit from having set-, list- or table-valued columns or from null-valued columns. You must identify *FDs* (functional dependencies) (stating a column's value is a function of others in its row) & *JDs* (join dependencies) (stating a table equals the join of projections of it). Textbook. – philipxy Oct 22 '17 at 02:30
  • 1
    [Something I just wrote.](https://stackoverflow.com/a/46869845/3404097) Maybe google my other answers re predicates, database design & normalization. [Beware.](https://stackoverflow.com/a/40640962/3404097) Some [good stuff](https://stackoverflow.com/a/24007275/3404097). – philipxy Oct 22 '17 at 02:40
  • As stated in the post, "In the listings of binary relationship tables, I replace the keys for persons, events, and files with related text to make them easier to read. MS Access allows the same behavior by displaying a table field according to a query." – NewSites Oct 23 '17 at 09:16
  • (Thanks, I commented re replacements without re-reading the whole question. Although it is still unclear whether the replacement is for presentation in this question or part of your proposed design.) What do you want to happen when the replacement introduces ambiguities? (They are only "text displays of numeric foreign keys" if 1:1 with FK values, ie are AKs/CKs of their tables.) Do you want to add a disambiguation to the displayed pseudo-id (eg also show the id), or prevent ambiguous cases, or leave the ambiguity? To transform to introduce ambiguity is to throw away info & is a bad idea. – philipxy Oct 27 '17 at 06:00
  • wondering if you have moved this question or left it here to be resolved? Also, is there a goal to collected the data in this way? are there reports or queries you want to be able to run to extract a particular form of information? – EmRoBeau Dec 21 '17 at 19:15
  • I've been jammed with another project and intend to return and follow up here when that's done, soon. I'll edit the question with an answer to your second question then. – NewSites Dec 22 '17 at 04:10

1 Answers1

0

You can find how to structure the database in any database book. for instance I came up with:

enter image description here

Next we need some data entry forms. the data entry forms are designed to take the case data and sort it into the appropriate database tables. So the data enterer does not need to know about the structure of the database and only has to enter the actual data. Information on making data entry forms for 1 to many and many to many forms is hard to find. Here is a link:

create form to add records in multiple tables

here is a data entry form and eventual subform I made for this data: enter image description here

the 1 side of a 1 to m relationship is the form and the m side is the subform. Here I dragged a file/persons form created with the wizard onto a files form that was also created using the wizard. to be able to represent the second 1 to m relationship we replace the textbox on the form which holds the foreign key with a combo box. Here ID_Person has been replaced with a combo box so the data enterer doesn't have to know anything about ID_person but just selects the persons name instead. The result is the data enterer doesn't need to know anything about the keys underlying relationships in the database and can just enter the case data.

mazoula
  • 1,221
  • 2
  • 11
  • 20
  • Thank you. I'm coming back to this project after a long delay and your answer came just in time. I was thinking I needed to use a data entry form but I was trying to avoid getting into that. From the complexity of the process that you linked to, I was right to want to avoid it. But hopefully, the process is laid out there well enough to help me do it straightforwardly. I'll report back when I get it done. – NewSites Jan 23 '19 at 02:26
  • Figure out how to give me an email address and I'll just send you the temp database i made :) – mazoula Jan 24 '19 at 06:29
  • A Meta answer (https://meta.stackexchange.com/a/66652/373133) said a way to do this is to use a throw-away address. So you can write to me at `Temp4Mazoula@Yahoo.com`. I will respond. Thank you. – NewSites Jan 24 '19 at 16:44