How do you select all rows which match one individual row in either of a thousand columns?

Question

For example, my personality match database has 1000 columns, with genre titles such as:

autoid | movie_genre_comedy | movie_genre_action | movie_genre_horror | more genres --> 
23432  | 1                  | 0                  | 1                  | 0
3241   | 0                  | 1                  | 1                  | 0
64323  | 0                  | 1                  | 0                  | 0

How do I match every row to the row with autoid 23432 so that the following table is produced:

autoid | movie_genre_comedy | movie_genre_action | movie_genre_horror | more genres --> 
23432  | 1                  | 0                  | 1                  | 0
3241   | 0                  | 1                  | 1                  |

Note that the row with autoid 64323 is not there because it does not have any similar columns to the chosen row with autoid 23432.

The simplest way to do this is:

SELECT *
from genretable
WHERE movie_genre_comedy = 1 
OR movie_genre_horror = 1 
OR ........... and so on for up to 1000 parameters.

I would say a table with 1000 columns indicates poor design. — Paul Bellora, Sep 25 '11 at 06:02
I would suggest looping through the columns, as in [this question](http://stackoverflow.com/questions/4950252/mysql-iterate-through-column-names). — denolk, Sep 25 '11 at 06:11
"genre_preferences" needs to be a separate table. "Person_id" + "preference" (comedy, action, horror, etc.) might be rows in the new table. Kublai Khan is right - a table with 1000 columns is indeed designed poorly. Change the design, and the query becomes easy. — paulsm4, Sep 25 '11 at 06:38
Pyjammez: your database has some design problems. All `movie_genre_AAA` columns should be values not columns: Movie(MovieID-PK, other columns), Genre(GenreID-PK,Name={comedy, action, etc.}, ... other columns), MovieGenre(MovieID-FK,GenreID-FK, PK(MovieID+GenreID)). — Bogdan Sahlean, Sep 25 '11 at 08:01

score 1 · Answer 1 · answered Jan 30 '13 at 00:36

The code you mentioned in your question is really the only way to do what you want with your current table structure. The answer is to create two new tables to map users to personality traits, like so:

create table `personality_trait_values`
(
     `id` smallint auto_increment primary key
    ,`value` varchar(20) not null unique
);

create table `personality_traits`
(
     `user_id` int not null references `users` (`autoid`)
    ,`personality_trait_id` int not null references `personality_trait_values` (`id`)
    ,unique (`user_id`,`personality_trait_id`)
);

With that, you can nuke the 1000 columns that describe whether the user has a personality trait or not, and your query becomes much more compact:

select u.`autoid`
    from `personality_traits` pt1
        join `personality_traits` pt2
            on pt1.`personality_trait_id` = pt2.`personality_trait_id`
            and pt1.`user_id` != pt2.`user_id`
    where pt1.`user_id` = `v_user_id_to_compare_to`

Where v_user_id_to_compare_to is a variable you have set previously in your stored procedure (to 23432, in the case of your question).

Converting the table structure you have now will be a bit tedious, but well worth it, and a lot of the tedium can be relieved by judicious use of copy/paste.

How do you select all rows which match one individual row in either of a thousand columns?

1 Answers1