I'm comparing the behavior between SQL engines. Oracle has the behavior I would expect from a SQL engine for full outer joins:
Oracle
CREATE TABLE sql_test_a
(
ID VARCHAR2(4000 BYTE),
FIRST_NAME VARCHAR2(200 BYTE),
LAST_NAME VARCHAR2(200 BYTE)
);
CREATE TABLE sql_test_b
(
NUM VARCHAR2(4000 BYTE),
FIRST_NAME VARCHAR2(200 BYTE),
LAST_NAME VARCHAR2(200 BYTE)
);
INSERT INTO sql_test_a (ID, FIRST_NAME, LAST_NAME) VALUES ('1', 'John', 'Snow');
INSERT INTO sql_test_a (ID, FIRST_NAME, LAST_NAME) VALUES ('2', 'Mike', 'Tyson');
INSERT INTO sql_test_b (NUM, FIRST_NAME, LAST_NAME) VALUES ('20', 'Mike', 'Tyson');
When I execute the following, it gives me the expected result. The resulting table contains two rows, with one of the rows containing NULL
for the NUM
field, because there is no john snow in the table sql_test_b
.
SELECT A.FIRST_NAME, A.LAST_NAME, A.ID, B.NUM
FROM
SQL_TEST_A A
FULL OUTER JOIN
SQL_TEST_B B
ON
A.FIRST_NAME = B.FIRST_NAME
AND
A.LAST_NAME = B.LAST_NAME;
You can test the sql script here: http://sqltest.net/
HIVE
In HIVE, however, if you were to try the same thing, the full outer join results in a table with two rows. The row that should be the "John Snow" row contains NULL
for the fields FIRST_NAME, LAST_NAME, and NUM. The 1
is filled in for ID
, but that's it.
Why such weird behavior in HIVE? Is this a bug? Or am I missing something...because Oracle 11g seems to handle this much better. Thanks.