0

SQL 5.7.37

I'm using this code to merge 2 sql tables into a third table. But now I need to do the same thing, with 5 or more tables.

create table table3 as
    select *
    from table1
    union all
    select *
    from table2
    where not exists (select 1 from table1 where table1.title = table2.title);

How do I add more tables to this command?


Here's an example set of data and the desired result:

Notice the tables have some rows that are exactly the same as other tables. I'm not sure if that matters.

Structure: ID (primary key, auto increment), Title (unique,index), DESC, URL

Table 1:

ID Title DESC URL
1 Bob thisisbob bob.com
2 Tom thisistom tom.com
3 Chad thisischad chad.com

Table 2:

ID Title DESC URL
1 Chris thisischris chris.com
2 Chad thisischad chad.com
3 Dough thisisdough doug.com

Table 3:

ID Title DESC URL
1 Morgan thisismorgan morgan.com
2 Jerome thisisjerome jerome.com
3 Mike thisismike mike.com

Table 4:

ID Title DESC URL
1 Chris thisischris chris.com
2 Chad thisischad chad.com
3 Luke thisisluke luke.com

Result (what I need in Table 5):

ID Title DESC URL
1 Bob thisisbob bob.com
2 Tom thisistom tom.com
3 Chad thisischad chad.com
4 Chris thisischris chris.com
5 Dough thisisdough doug.com
6 Morgan thisismorgan morgan.com
7 Jerome thisisjerome jerome.com
8 Mike thisismike mike.com
9 Luke thisisluke luke.com

How can I add more tables to my union sql command?

lemon
  • 14,875
  • 6
  • 18
  • 38
Mae Webb
  • 11
  • 2
  • There essentially isn't any restriction on how many times you can use UNION, but I think it might be better for you to union all the non PK columns and provide a renumbering to form a new PK. What version of mysql? – Caius Jard Apr 09 '22 at 09:32
  • @CaiusJard would I repeat the 'where not exists' line as well? That's where I get lost. – Mae Webb Apr 09 '22 at 09:34
  • 1
    *Notice the tables have some rows that are exactly the same as other tables. I'm not sure if that matters.* - and you want them deduping? – Caius Jard Apr 09 '22 at 09:35
  • @CaiusJard yes, dont want it to copy dupes based on the unique title column SQL 5.7.37 – Mae Webb Apr 09 '22 at 09:37
  • If the source tables are not too huge then use UNION DISTINCT. Else use separate CREATE TABLE with complete table definition and according UNIQUE constraint, and 5 separate INSERT IGNORE. – Akina Apr 09 '22 at 09:57
  • @Akina, I tried distinct but the tables were too large and it was timing out. The code at the top of the question still merged them very quickly though, but only for 2 tables. My actual tables have over 50 columns and I'm not sure create table will work well. When I try to get the Fulltext in phpmyadmin it says " #1070 - Too many key parts specified; max 16 parts allowed" – Mae Webb Apr 09 '22 at 10:11
  • Another option could be to gather all united data first (using UNION_ALL) and then select distinct values once, as it is more efficient than doing 4 UNION DISTINCT. Can this solution work for you? – lemon Apr 09 '22 at 10:27

1 Answers1

0

An idea could be to postpone the generation of every unique value only after you've aggregated all your tables, as long as UNION_ALL is way faster than UNION, and you would do the DISTINCT operation once instead of five times in this way.

SET @cnt = 0;

SELECT (@cnt := @cnt + 1) AS rowNumber,
       distinct_aggr_tables.* 
FROM (SELECT DISTINCT * 
      FROM (SELECT `Title`, `DESC`, `url` FROM Table1
            UNION ALL
            SELECT `Title`, `DESC`, `url` FROM Table2
            UNION ALL
            SELECT `Title`, `DESC`, `url` FROM Table3
            UNION ALL
            SELECT `Title`, `DESC`, `url` FROM Table4) aggr_tables
     ) distinct_aggr_tables

Check the demo here.


EDIT: How to select all fields except one from a MySQL table for this task

There are two interesting ways of doing this:

1) The first approach copies each table into different temporary tables, then using the ALTER statement to drop the column we're not interested in, hence using these tables with the first version of this code.

# for each table
CREATE TEMPORARY TABLE temp_Table1 AS
SELECT * FROM Table1;

ALTER TABLE temp_Table1
DROP Id;

2) The second approach uses a prepared statement, which allows you to build the query as a string. This can help for this exercise because we may want to, retrieve all column names from INFORMATION_SCHEMA.COLUMNS table within a query, and then remove the field name we're not interested in, hence pass the list of column names to the original query.

SET @sql = CONCAT(
    'CREATE OR REPLACE VIEW AllTables AS ', 
    
    'SELECT ROW_NUMBER() OVER(ORDER BY Title ASC) AS rowNumber,
            distinct_aggr_tables.* 
     FROM (SELECT DISTINCT * 
           FROM (SELECT ',
    
    (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 
                           'Id`,', ''), 
                   '` ')
     FROM INFORMATION_SCHEMA.COLUMNS cols
     WHERE cols.TABLE_NAME = 'Table1' AND cols.TABLE_SCHEMA = 'test'),
    
    'FROM Table1 
     UNION ALL
     SELECT ',
                    
    (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 
                           'Id`,', ''), 
                   '` ')
     FROM INFORMATION_SCHEMA.COLUMNS cols
     WHERE cols.TABLE_NAME = 'Table2' AND cols.TABLE_SCHEMA = 'test'),
    
    'FROM Table2
     UNION ALL
     SELECT ',
     
    (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 
                           'Id`,', ''), 
                   '` ')
     FROM INFORMATION_SCHEMA.COLUMNS cols
     WHERE cols.TABLE_NAME = 'Table3' AND cols.TABLE_SCHEMA = 'test'),
               
     'FROM Table3
      UNION ALL
      SELECT ',
                    
     (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 
                            'Id`,', ''), 
                    '` ')
      FROM INFORMATION_SCHEMA.COLUMNS cols
      WHERE cols.TABLE_NAME = 'Table4' AND cols.TABLE_SCHEMA = 'test'),
      
     'FROM Table4) aggr_tables) distinct_aggr_tables;'
);
                     
PREPARE stmt FROM @sql;
EXECUTE stmt;

SELECT * FROM AllTables;

Note that this code reproduces exactly the first code of this post, except for the fact that uses a ROW_NUMBER window function instead of a global variable that updates itself.

This solution makes some assumptions, according to which it should be carefully quick-fixed:

  • the tables are exactly 4: in order to change this amount, it's necessary to replicate the following code in the right place for each new table:
SELECT ',
    (SELECT CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 
                           'Id`,', ''), 
                   '` ')
     FROM INFORMATION_SCHEMA.COLUMNS cols
     WHERE cols.TABLE_NAME = <new_table_name> AND cols.TABLE_SCHEMA = 'test'),
FROM <new_table_name>
  • the current table names are Table1, Table2, Table3, Table4 and the database name is test: these references should be replaced when we're looking for the field names of a specific table (filtering by table name and database name):
SELECT '
   (SELECT CONCAT ...
    FROM ...
    WHERE cols.TABLE_NAME = <table_name> AND cols.TABLE_SCHEMA = <db_name>),
'FROM <table_name>
  • the field name to remove is 'Id' and it is found as first column of all the tables: if the name is different, it's necessary to change its name during the removal of this column. Moreover if this is not the first column, some tweaks are needed here:
# COLUMN_NAME: 
# ['Id', 'Title', 'DESC', 'url']
#
# GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'):
# 'Id`,`Title`,`DESC`,`url'
# 
# REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', '')
# '`Title`,`DESC`,`url'
# 
# CONCAT(REPLACE(GROUP_CONCAT(COLUMN_NAME SEPARATOR '`,`'), 'Id`,', ''), '` ')
# '`Title`,`DESC`,`url`'

(backticks are added to avoid exceptions due to DESC)

Note1: The generation of the four SELECT groups for each table may be automated (a simple example at the bottom of this page) by cycling with a variable on the table names contained in INFORMATION_SCHEMA.TABLES. Yet I wouldn't venture forth that path as it becomes difficult to deal with the string text to be evaluated with the prepared statement and the CONCAT of a calculated value from another table (INFORMATION_SCHEMA.COLUMNS cols).

Note2: Couldn't see the effects of this code within sql fiddles because wasn't able to access the INFORMATION_SCHEMA db tables. The code has been tested offline on a MySQL 8.0 database.


The first approach can be quite memory expensive, while the second one may be more efficient if handled carefully in the fixes to tailor your db.

Overall no perfect solution, though some that may solve your problem.

ps: any suggested edits to improve this post are more than welcome.

lemon
  • 14,875
  • 6
  • 18
  • 38
  • is there any way to do this without specifying all of the columns? – Mae Webb Apr 09 '22 at 11:33
  • Yes, there are a couple of ways of doing that: the former is using a prepared statement (more complex but efficient), the latter is using a temporary table that selects all columns, then apply an alter table that drops the column you don't want. I'll update the answer accordingly soon, though I may need the name of the database – lemon Apr 09 '22 at 12:53
  • Note that the requirement is to distinct by one particular column, not all columns – Caius Jard Apr 09 '22 at 14:30
  • @CaiusJard I thought he meant that there are some rows in different tables for which all columns are the same by `Notice the tables have some rows that are exactly the same as other tables.`. Can you double check on this point for understanding @MaeWebb? – lemon Apr 09 '22 at 18:43
  • 1
    I'd inferred "distinct by title" from the EXISTS in the "working" attempt.. There was also a latter posted comment to me that seemed to indicate title was the only discriminator.. but I agree it's worth a double check because I don't think any existing sample data is conclusive for it – Caius Jard Apr 09 '22 at 18:51