I'm sorry if this is a stupid question, but I can't seem to get my head around it. I'm fairly new to SQL and this behavior would be strange in R or Pandas or other things that I'm used to using.
Basically, I have two tables in two different databases, with a common key user_id
. I want to join all the columns with
SELECT * FROM db1.first_table t1
JOIN db2.second_table t2
ON t1.user_id = t2.user_id
Great, it works. Except there are two (identical) columns called user_id
. This wouldn't really matter, except that I am doing this in pyspark and when I try to export the joined table to a flat file I get an error that two of the columns have the same name. There are work-arounds for this, but I'm just wondering if someone can explain why the join returns both user_id
columns. It seems like it is an inner join so by definition the columns are identical. Why would it return both?
As a side question, is there an easy way to avoid this behavior?
Thanks in advance!