0

I'm using MS Access with a MySQL database. One table has records which represent 'documents':

Table: doc
Primary Key: doc_id

These records can be linked together via a link table:

Table: link
Primary Key: link_id
Foreign Key: doc_id_A
Foreign Key: doc_id_B

Thus the records may be linked in a chain, eg Doc A linked to Doc B, which is linked to Doc C, etc, and also, Doc A may be linked to any number of other documents.

In reality, the 'family' of inter-related documents wouldn't exceed 20 records.

I'm looking for an efficient MySQL proc, or VBA function - or a query - to find all the members of a 'family' for one specified record.

Any suggestions would be most welcome!

maxhugen
  • 1,870
  • 4
  • 22
  • 44
  • Is this any use? http://explainextended.com/2009/03/17/hierarchical-queries-in-mysql/ – Fionnuala Nov 30 '11 at 00:55
  • Hey Remou, thanks mate, going to explore your link more tomorrow. – maxhugen Nov 30 '11 at 06:33
  • Hey Remou, thanks mate (we've chatted and resolved things together before), going to explore your link more tomorrow. After a quick glance though, it looks it can only handle a Parent/Child relationship. In my 'join' table, the doc who's 'family' we want could be either parent or child. This is a non-defined hierarchy (ie, there isn't a hierarchy at all, the records are just linked). I currently have a union join that requires six queries to get to just 2 levels deep. Being an Aussie, I'd happily offer you a slab or two (of beer) to help me with this. – maxhugen Nov 30 '11 at 06:47
  • Hey Max If any of the small clots of brain matter I have left start to function, you will hear from me, but it is a long shot. Beer will do fine :) – Fionnuala Nov 30 '11 at 10:38
  • Unfortunately, I can't see any way to use the hierarchical-type queries you pointed me to, as the links in my case do not represent a parent/child relationship. Thanks anyway. If you have any other ideas... ? – maxhugen Dec 06 '11 at 23:30
  • If two documents (say DocID=10 and DocID=20) are linked, how many records in the "link" table represent that relationship? One or two? If it is one, do you implement a rule like, `doc_id_A` is always less than `doc_id_B`, etc? – mwolfe02 Dec 07 '11 at 16:44
  • Why not refactor database design: create a docFamily table and, if this is a parent-child relation, create foreign key on document, and if it is many-to-many, then, create a new table document-family with both foreigns keys? – dani herrera Dec 07 '11 at 23:37

3 Answers3

1

Unfortunately, it cannot be done in MySQL given such a table structure as MySQL does not support recursive queries. I suggest to explore the answers in Mysql recursion? - there is some advice as to how to store the data in MySQL to be able to write such queries.

You also have three other options:

In case you know the max. depth of such families, and it's not huge, you can still achieve it with mySQL. For a depth of 3 levels, it would look like this:

SELECT A.doc_id_a as a, B.doc_id_a a_child, C.doc_id_a as a_sub_child 
FROM links as A, links as B, links as C 
WHERE A.doc_id_a = your_doc_id AND
      A.doc_id_b = B.doc_id_a AND
      B.doc_id_b = C.doc_id_a

Following the same logic, you can add as many layers as you need. The only thing is - you'll need to get result from all columns and find unique values (and there can be many rows, if the relationship is not 1-1 all the time).

A second option is to do this in VBA. I do not know enough to provide the code, but essentially it could look like this (a recursive approach):

family = array();
family = getFamily('your_doc_id', family);

function getFamily(id) {
   children = DB->getColumn('SELECT doc_id_b FROM links WHERE doc_id_a = ?', id);
   if (empty(children)) return family;
   else {
     foreach (children as child) {
       family[] = getFamily(child);
     }
   }
 }

Lastly, you can switch to PostgreSQL, which supports recursive queries :) (http://www.postgresql.org/docs/8.4/static/queries-with.html).

Community
  • 1
  • 1
Aurimas
  • 2,518
  • 18
  • 23
  • Thanks Aurimas, I'll check out your suggested link, and try your suggestions. Let you know how I go! – maxhugen Dec 08 '11 at 09:01
1

So the link table gives a self join to doc, you absolutely need a doc to be able to:

  1. Link to itself
  2. ... as many times as it likes
  3. Link to another doc
  4. ... multiple times
  5. Be separately linked to by the same other doc
  6. ... multiple times
  7. Store information about the link

So your link table could have 10 separate links 1-1, 1-1, 1-1, 1-2, 1-2, 1-2, 2-1, 2-1, 2-1, 2-2 etc. with just 2 docs in the 'family'.


I expect when you look at this list you will probably think that you don't need most of them, a lot of the inefficiency in your solution might be coming from this unnecessary flexibility. My favoured suggestion would be to start with a strict hierarchy and build minimally from there.

But here is my answer anyway, it's tested and working in Access-2010 and local tables. ADODB should work just as well with linked tables.

Option Compare Database
Option Explicit
Const MaxInFamily = 30

'Requires a reference to "Microsoft ActiveX Data Objects 2.x Library" (VBA Menu: Tools, references)

Function GetFamily(id As Long) As Long()
Dim Found(MaxInFamily) As Long
Dim MaxFound As Integer
Dim CurrentSearch As Integer
Dim Sql As String
Dim rs As New ADODB.Recordset

    Found(1) = id
    MaxFound = 1

    For CurrentSearch = 1 To MaxInFamily

        If CurrentSearch > MaxFound Then Exit For

        Sql = "SELECT doc_id_2 as NewID FROM link WHERE doc_id_1 = " & Found(CurrentSearch) _
                & " AND doc_id_2 NOT IN (" & ArrayToCsv(Found, MaxFound) & ")" _
                & " UNION " _
                & " SELECT doc_id_1 FROM link WHERE doc_id_2 = " & Found(CurrentSearch) _
                & " AND doc_id_1 NOT IN (" & ArrayToCsv(Found, MaxFound) & ")"

        rs.Open Sql, CurrentProject.Connection

        Do While Not rs.EOF
            MaxFound = MaxFound + 1
            Found(MaxFound) = rs("NewID").Value
            rs.MoveNext
        Loop

        rs.Close

    Next CurrentSearch

    GetFamily = Found

End Function

Function ArrayToCsv(SourceArray() As Long, ItemCount As Integer) As String
Dim Csv As String
Dim ArrayIndex As Integer

    For ArrayIndex = 1 To ItemCount
        Csv = Csv & SourceArray(ArrayIndex)
        If ArrayIndex < ItemCount Then Csv = Csv & ", "
    Next ArrayIndex

    ArrayToCsv = Csv

End Function

Duplicated results and queries are avoided by excluding items already found at the server, and as the link is unidirectional I've used a UNION query to look both ways at once. At most it'll do MaxInFamily round trips to the server.

Stephen Turner
  • 7,125
  • 4
  • 51
  • 68
  • Thanks webturner. I'm sure you're right about the union join - as you expressed it better than I did, the links are unidirectional. However, your points (1) and (2) don't apply, as a document can't logically link to itself. Now, that array is certainly something I've never thought of, so I'll try some code out on that. – maxhugen Dec 08 '11 at 09:12
  • Webturner, your function worked great. I changed GetFamily to return a string, and changed the result to – maxhugen Dec 11 '11 at 22:37
  • GetFamily = ArrayToCsv(Found, MaxFound) Unfortunately, the function doesn't work directly in a query eg: SELECT * FROM doc WHERE doc_id IN ( GetFamily( [some_doc_id] ) ) Is there something that I'm missing here, or do I need to set the recordsource for forms/reports manually via VBA? – maxhugen Dec 11 '11 at 22:43
  • GetFamily would return an array which SQL can't handle, build the SQL String with `" + ArrayToCsv(GetFamily([some_doc_id])) + "` to make the syntax work. Or you could create a boolean function `WHERE IsDocInFamily(docID, [some_doc_id])` but the built SQL string would probably be much more performant. – Stephen Turner Dec 12 '11 at 10:02
  • Many thanks. I'm building the SQL - as required - like you suggested. Appreciate your solution! – maxhugen Dec 13 '11 at 01:11
1

I had a similar problem some time ago. Instead of creating links between the documents I used a separate table containing the families:

Table: LinkTable
PK: LinkId, DocumentID
FK: DocumentID

Instead of having pairs of document IDs, each time you create a new family, you introduce a new LinkId.

LinkId | DocumentID

1      | 7
1      | 9
1      | 13
2      | 4
2      | 22
2      | 23
2      | 30
3      | 6
3      | 80

Here you have three families with the document IDs:

{ 7, 9, 13 },
{ 4, 22, 23, 30 },
{ 6, 80 }

It is easier to retrieve families, however it requires a somewhat more complicated logic for inserting and removing documents to and from families.

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
  • Yes, could get a bit complicated, eg if you then linked DocID 9 and 23, you'd have an issue with the family id. – maxhugen Dec 13 '11 at 03:39
  • This is what I meant with "more complicated logic". My implementation was for Oracle and was used in different Access applications. Instead of allowing the applications to access the link table directly, I created stored procedures for inserting and removing entries, in order to ensure the integrity of the link table. E.g., if a document is removed from the link table, then a single document remaining in a family will have to be deleted as well, as it is no more linked to any other documents. – Olivier Jacot-Descombes Dec 13 '11 at 14:48