5

I want to get all table names which has 3 specific columns.

What I want is from information schema I want to get all table names which contains columnA AND columnB AND columnC.

Currently I am using a query like

SELECT DISTINCT TABLE_NAME 
  FROM INFORMATION_SCHEMA.COLUMNS
    WHERE COLUMN_NAME='columnA'
        AND TABLE_SCHEMA='mysampledatabase';

How can I extend the above query and select tables which contains column names columnA and columnB ?



I came across this answer and other similar answers. They have answered following:

SELECT DISTINCT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME IN ('columnA','ColumnB') AND TABLE_SCHEMA='YourDatabase';

However what they select any table which has either of the 3 columns. I believe that is because of WHERE COLUMN_NAME IN ('columnA','ColumnB','ColumnC') which , i believe, is equivalent to COLUMN_NAME = 'columnA' OR COLUMN_NAME = 'columnB' OR COLUMN_NAME = 'columnC'



I have tried to replace the WHERE clause with WHERE COLUMN_NAME='columnA' AND COLUMN_NAME='columnB' AND COLUMN_NAME='columnC' which obviously doesn't return any results because COLUMN_NAME can only be 1 value at a time!

Community
  • 1
  • 1
Nis
  • 1,469
  • 15
  • 24

2 Answers2

4

This is a faster version:

SELECT TABLE_NAME 
      FROM INFORMATION_SCHEMA.COLUMNS
      WHERE COLUMN_NAME IN ('columnA','columnB','columnC')
group by TABLE_NAME
having count(distinct COLUMN_NAME) = 3 

In above query having count will have no. of columns specified in IN phrase.

or try following(but it will create disaster for so many columns. So avoid it)

SELECT DISTINCT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME IN 
( SELECT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME = 'columnA' )
and TABLE_NAME in 
( SELECT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME = 'columnB')
and TABLE_NAME in 
( SELECT TABLE_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME = 'columnC')
Dhwani
  • 7,484
  • 17
  • 78
  • 139
  • Thank you for your response. Your solution 2 would be very slow because it traverses `INFORMATION_SCHEMA.COLUMNS` table, which is usually huge, 3 times. What if I wanted to compare 6 columns? +1 for your contribution though. – Nis May 20 '14 at 22:43
3

Here is query which might help you:

SELECT`TABLE_SCHEMA`, `table_name`,c1.`column_name`,c2.`column_name`,c3.`column_name` FROM `COLUMNS` C1
INNER JOIN `COLUMNS` C2 USING(`TABLE_SCHEMA`, `TABLE_NAME`)
INNER JOIN `COLUMNS` C3 USING(`TABLE_SCHEMA`, `TABLE_NAME`)
WHERE c1.`column_name` = 'col1'
AND c2.`column_name` = 'col2'
AND c3.`column_name` = 'col3';

But you will need to add|remove extra JOIN for each column you looking for.

Now i think it's kind of heavy and hard to understand, here is another option:

SELECT table_schema, table_name, count(*) AS `TablesCount` FROM `COLUMNS`
WHERE `column_name` IN ('col1','col1','col1')
GROUP BY table_schema, table_name
HAVING `TablesCount` = 3

TablesCount should be equal number of columns you looking for. If it's so, then your table have all required columns

Uriil
  • 11,948
  • 11
  • 47
  • 68
  • Thanks. I tried the first query and it works but is indeed heavy. However, where do I put column names `columnA` , `columnG` , `columnX` etc in second query? – Nis May 20 '14 at 06:23
  • @Nis Oh, sorry, i were testing and forget to remove my columns from query. Updated*(second row) – Uriil May 20 '14 at 06:39
  • Thanks. For anyone else interested in performace results, solution 2 had minimum double the speed (half execution time) then solution 1. E.g. If soultion 1 took 36 seconds (yes, I have huge DB), solution 2 would only take maximum 18 seconds. – Nis May 20 '14 at 22:40