My join is producing results that I don't understand.
If it's important or relevant, I'm using Parquet Tables in Impala.
What I'm doing is:
create table test1(foo string, bar int) stored as parquet;
create table test2(foo string, bar int) stored as parquet;
insert into test1 values ("something1",1);
insert into test2 values ("something2",2);
Checking to make sure that works:
select * from test1;
Gives me the output:
+----------------------+
| foo | bar |
+----------------------+
| something1 | 1 |
+----------------------+
1 rows
And
select * from test2;
Gives me the output:
+----------------------+
| foo | bar |
+----------------------+
| something2 | 2 |
+----------------------+
1 rows
Everything seems fine so far. But now when I try to join these 2 tables with
select * from test1 left outer join test2 using (foo);
I get:
+---------------------------------------------+
| foo | bar | foo | bar |
+---------------------------------------------+
| something1 | 1 | something2 | 2 |
+---------------------------------------------+
1 rows
That's unexpected. I expected the output to be something1, 1, null, null. Shouldn't the join only happen when test1.foo = test2.foo?
I also tried doing this with syntax join on test1.foo = test2.foo as well as with an inner join and saw the same results.
Can someone please explain to me what's going on here? I reread the documentation and I don't understand why this is happening.