-3

In SQL, JOIN and INNER JOIN mean the same thing. In Matlab, they are different commands. Just from perusing the documentation thus far, they appear on the surface to fufill the same general function, with possible differences in the details, as controlled by parameters. I am slogging through the individual examples and may (or may not) find the fundamental difference. However, I feel that the difference should not be a subtlety that users have to ferrut out of the examples. These are two separate commands, and the documentation should make it clear up front why they are both needed. Would anyone be able to chime in about the key difference? Perhaps it could become a request to place it front and centre in the documentation.

user36800
  • 2,019
  • 2
  • 19
  • 34

1 Answers1

0

I've empirically characterized the difference between JOIN and INNERJOIN (some would refer to this as reverse engineering). I'll summarize from the perspective of one who is comfortable with SQL. As I am new to SQL-like operations in Matlab, I've only been able to test drive it to a limited degree, but the INNERJOIN appears to join records in the same manner as SQL. Since SQL is a pretty open language, the behavioural specification of INNERJOIN is readily available, and I won't dwell on that. It's Matlab's JOIN that I need to suss out.

In short, from my testing, Matlab's JOIN seems to "join" the rows in the two operand table in a manner more like Excel's VLOOKUP rather than any of the JOINS in SQL. In general, the main differences with SQL joins seem to be (i) that the right hand table cannot have repeating values in the columns used matching up rows between the two tables and (ii) all combinations of values in the key columns of the left hand table must show up in the right hand table.

Here is the empirical testing. First, prepare the test tables:

a=array2table([
   1  2
   3  4
   5  4
],'VariableNames',{'col1','col2'})

b=array2table([
   4  7
   4  8
   6  9
],'VariableNames',{'col2','col3'})

c=array2table([
   2  10
   4  8
   6  9
],'VariableNames',{'col2','col3'})

d=array2table([
   2  10
   4  8
   6  9
   6  11
],'VariableNames',{'col2','col3'})

a2=array2table([
   1  2
   3  4
   5  4
   20 99
],'VariableNames',{'col1','col2'})

Here are the tests:

>> join(a,b)
Error using table/join (line 111)
The key variable for B must have unique values.

>> join(a,c)
ans = col1    col2    col3
      ____    ____    ____
      1       2       10
      3       4        8
      5       4        8

>> join(a,d)
Error using table/join (line 111)
The key variable for B must have unique values.

>> join(a2,c)
Error using table/join (line 130)
The key variable for B must contain all values in the key
variable for A. 

The first thing to notice is that JOIN is not a symmetric operation with respect to the two tables.

It seems that the 2nd table argument is used as a lookup table. Unlike SQL joins, Matlab throws an error if it can't find a match in the 2nd table [See join(a2,d)]. This is somewhat hinted at in the documentation, though not entirely clearly. For example, it says that the key values must be common to both tables, but join(a,c) clearly shows that the tables do not have to have common key values. On the contrary, just as one would expect of a lookup table, 2nd table contains entries that aren't matched do not throw errors.

Another difference with SQL joins is that records that cause the key values to replicate in 2nd table are not allowed in Matlab's join. [See join(a,b) & join(a,d)]. In contrast, the fields used for matching records between tables aren't even referred to as keys in SQL, and hence can have non-unique values in either of the two tables. The disallowance of repeated key values in the 2nd table is consistent with the view of the 2nd table as a lookup table. On the other hand, repetition on of key values are permitted in the 1st table.

user36800
  • 2,019
  • 2
  • 19
  • 34
  • 3
    Do you mean that matlab's `innerjoin` works exactly as SQL's `INNER JOIN`? This wasn't clear, and without that info this is not an answer to your question regarding "the difference between matlab `join` vs `innerjoin`". – Andras Deak -- Слава Україні Dec 06 '15 at 19:22
  • I've modified the opening paragraph in response to your comment. Interesting to see that the answer was down-voted. I can't imagine why. This detail is needed in order to use the command, and isn't all that evident in the man pages. One can only wonder about the true motives for the down votes. – user36800 Dec 06 '15 at 23:13
  • Thanks, I believe it's more complete this way. Regarding your unasked question: neither of the downvotes are mine:) – Andras Deak -- Слава Україні Dec 06 '15 at 23:15
  • I was kind of wondering whether it was...er...corporate interests, as it does not reflect well on the documentation. As I said, however, I'm actually on their side. Too much invested in the language. But I can only provide eyes on the ground as to what might help. They have to do their part. If the response is adversarial, no one benefits, and every "dis-benefits". – user36800 Dec 06 '15 at 23:20
  • 1
    So where's the `innerjoin` in this answer? You tested `join`, good for you; you found the usual errors. But where'd you test that other elusive function you keep talking about? – Adriaan Dec 07 '15 at 16:33
  • I already modified the opening paragraph to explain that the rows that are joined seem to follow SQL's inner joining scheme. The point of this question wasn't to replicate the many explanations that can be provided about SQL's inner join to varying degrees of detail, as that is readily searchable, and there's plenty of good material online. Replicating it all here would dilute the differences that was the focus of this question. This question was posed from the perspective of a SQL user trying to realize similar or related functions in Matlab -- namely, the join function. – user36800 Dec 07 '15 at 23:19
  • By the way, in case it wasn't obvious, the point of the question wasn't unearth "the usual errors" as you refer to it. It was to unearth the differences between JOIN and INNER JOIN that were not readily apparent from the documentation. However, I modified the answer to include observations between Matlab join and SQL joins in *general*. As for your tone, the "other function you keep talking about"....I dunno, it's pretty clear that you're not intending to be constructive, so why *are* you here? – user36800 Dec 07 '15 at 23:53
  • BTW, I noticed (at least in 2015a) that some of the behavioural specification is contained at the very end of the doc page under "Algorithms". It's not all that clearly written, and I would say that it makes sense to me only because I'm familiar with SQL joins, but it certainly isn't algorithmic info in the sense of a sequence of steps to achieve a result. It is behaviour specification (sort of, if it were more precise) and needs to be way up front. – user36800 Dec 07 '15 at 23:54