For example, suppose we're conducting research where students can take up to 10 different tests, and each table in the database stores all the students' responses for one test. The tables are named after each test as: T1, T2, ... , T10. Suppose each table has a primary key column 'Username' that identifies each student. Students may or may not have completed each test, so there may or may not be a record in each table for each student.
What is the correct SQL Query to return all the test data from all tables, with one row per student (one row per username)? I want the simplest query possible that returns the correct results. I would also like to coalesce the Username fields into a single Username field in the final query.
To clarify, I understand that SQL has a major limitation in that it does not support a syntax to select all columns except one or more fields like "select *[^ExcludeColumn1][^ExcludeColumn2]". To avoid specifically naming all columns in the final query, it would be acceptable to leave all the Username columns there, as long as it includes a coalesced Username field at the beginning named something like RowID.
As for the overall query, one option would be to perform a union all on the username column of all ten tables, then select the distinct usernames across all tables, then perform a series of left joins against the list of distinct usernames on all 10 tables. That would result in a very straightforward query where each left join is performed on the same distinct set of usernames, but I want to avoid a separate up-front query for distinct usernames. (Although if that's the best option, let me know). It would look something like this:
select * from
(select distinct coalesce(t1.Username,t2.Username,...,t10.Username) as RowID from t1,t2,t3,t4,t5,t6,t7,t8,t9,t10) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
Although that is short and easy to write, it is incredibly inefficient and would take hours to run on test tables with 5000+ rows each, so with an adjustment, an equivalent version that runs in a few seconds is:
select * from (
select distinct Username as RowID from (
select Username from t1
union all
select Username from t2
union all
...
select Username from t10
) all_usernames) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
I think that what I have above might be the most efficient and correct query (takes only a couple seconds to run and returns correct result set), but I also thought perhaps it could be simplified with some kind of full join. The problem is that full joins get confusing with more than two tables, because without pre-determining the usernames, each subsequent table would have to match records against any of the preceding tables, resulting in a query where each additional table has "[previous table count] + 1" conditions on matching the username.