0

Perhaps this was a wonky design decision on my part, but I created a table that has a column that holds from 1 to N comma-separated values, and I need to query whether any of several values are contained within those values.

To make it more clear, I'm allowing the user to select an array of movie genres (comedy, drama, etc.) and have a "movies" table where these genres are all contained in one column. For example, the genres column for the movie "The Princess Bride" contains the composite csv value "Adventure, Family, Fantasy"

So if the user selects at least one of those genres, that movie should be included in the result set that is returned based on the search criteria that they choose. But the movie should only be returned once, even if the user selected more than one of those genres to search for.

A previous incarnation of the database contained lookup tables and many-to-many tables which obviated this problem, but in an attempt to make it "easier" and more straightforward by making it one big (wide) table, I have run into this conundrum (of how to craft the query).

The query is currently dynamically built this way:

string baseQuery = "SELECT * FROM MOVIES_SINGLETABLE ";
string imdbRatingFilter = "WHERE IMDBRating >= @IMDBMinRating ";
string yearRangeFilter = "AND YearReleased BETWEEN @EarliestYear AND @LatestYear ";
string genreFilterBase = "AND GENRE IN ({0}) ";
string mpaaRatingFilterBase = "AND MPAARating IN ({0}) ";
string orderByPortion = "ORDER BY IMDBRating DESC, YearReleased DESC ";

...and the strings containing the selected criteria built like so:

if (filterGenres)
{
    if (ckbxAction.Checked) genresSelected = "\'Action\',"; // changed to "@" hereafter:
    if (ckbxAdventure.Checked) genresSelected = genresSelected + @"'Adventure',";
        . . .
    if (ckbxWar.Checked) genresSelected = genresSelected + @"'War',";
    if (ckbxWestern.Checked) genresSelected = genresSelected + @"'Western',";
    LastCommaIndex = genresSelected.LastIndexOf(',');
    genresSelected = genresSelected.Remove(LastCommaIndex, 1)
}
// the same situation holds for mpaaRatings as for Genres:
if (filterMPAARatings)
{
    if (ckbxG.Checked) mpaaRatingsSelected = @"'G',";
    if (ckbxPG.Checked) mpaaRatingsSelected = mpaaRatingsSelected + @"'PG',";
    if (ckbxPG13.Checked) mpaaRatingsSelected = mpaaRatingsSelected + @"'PG13',";
    if (ckbxNR.Checked) mpaaRatingsSelected = mpaaRatingsSelected + @"'NR',";
    LastCommaIndex = mpaaRatingsSelected.LastIndexOf(',');
    mpaaRatingsSelected = mpaaRatingsSelected.Remove(LastCommaIndex, 1);
}

. . .

//string genreFilterBase = "AND GENRES IN ({0}) ";
if (filterGenres)
{
    genreFilter = string.Format(genreFilterBase, genresSelected);
    completeQuery = completeQuery + genreFilter;
}
//string mpaaRatingFilterBase = "AND MPAARating IN ({0}) ";
if (filterMPAARatings)
{
    mpaaRatingFilter = string.Format(mpaaRatingFilterBase, mpaaRatingsSelected);
    completeQuery = completeQuery + mpaaRatingFilter;
}

Is my design salvageable? IOW, can I retrieve the appropriate data given these admittedly questionable table design decisions?

UPDATE

I tested GMB's SQL by incorporating it into my SQL, but I may be doing something wrong, because it won't compile:

enter image description here

I don't know why those commas are there, but I reckon GMB is more of a SQL expert than I am...

Nevertheless, this does work (sans the commas and pipes):

enter image description here

UPDATE 2

I tried using CONTAINS, also:

SELECT * FROM MOVIES_SINGLETABLE WHERE IMDBRating >= 7.5 AND YearReleased BETWEEN '1980' AND '2020' AND CONTAINS (genres, 'Adventure') OR CONTAINS (genres,'Family') OR CONTAINS (genres, 'Fantasy') AND CONTAINS (MPAARating, 'G') OR CONTAINS (MPAARating, 'PG') OR CONTAINS (MPAARating, 'PG-13') ORDER BY IMDBRating DESC, YearReleased DESC

...but got "Cannot use a CONTAINS or FREETEXT predicate on table or indexed view 'MOVIES_SINGLETABLE' because it is not full-text indexed."

The answer by Alex Aza here [https://stackoverflow.com/questions/6003240/cannot-use-a-contains-or-freetext-predicate-on-table-or-indexed-view-because-it] gives a solution, but apparently it's not available for SQL Server Express:

enter image description here

...and besides, this will eventually (soon) be migrated to a SQLite table, anyway, and I doubt SQLite would support CONTAINS if doing it in SQL Server (albeit Express) requires, even if possible at all, hurtling through hoops.

UPDATE 3

I incorporated Lukasz's idea for SQLite (as that's what I'm now querying), with a query string that ended up being:

SELECT MovieTitle, MPAARating, IMDBRating, DurationInMinutes, YearReleased, genres, actors, directors, screenwriters FROM MOVIES_SINGLETABLE WHERE IMDBRating >= @IMDBMinRating AND (YearReleased BETWEEN @EarliestYear AND @LatestYear) AND (INSTR(genres, @genre1) > 0 OR INSTR(genres, @genre2) > 0) AND (MPAARating = @mpaaRating1) ORDER BY IMDBRating DESC, YearReleased DESC LIMIT 1000

...but still get no results.

B. Clay Shannon-B. Crow Raven
  • 8,547
  • 144
  • 472
  • 862
  • 1
    Is there any reason why not to use LINQ here? – Rod Ramírez Nov 03 '20 at 22:32
  • No, LINQ would be fine, if it doesn't slow things down considerably. – B. Clay Shannon-B. Crow Raven Nov 03 '20 at 22:34
  • 2
    No "perhaps" about it. :) You fell victim to one of the [classic blunders](https://www.youtube.com/watch?v=RWW6aDpUvbQ) - the most famous of which is "never get involved in a land war in Asia" - but only slightly less well-known is this: "Never put csv values in a table column!" – Joel Coehoorn Nov 03 '20 at 22:37
  • Good one! William Goldman wrote so many memorable and quotable lines in that flick. – B. Clay Shannon-B. Crow Raven Nov 03 '20 at 22:39
  • @B.ClayShannon you tagged your question SQLite, so GMB posted SQLite code. This is not compatible with SQL Server. – forpas Nov 06 '20 at 18:39
  • 1
    I guess this already answered? Design is bad, you can get away with like conditions if database is small or perfomance is not needed, because all such queries will perform a full table scan. If that was for example Postgres database - you could store that in array column (which are indexable), but not here. Also, why year is nchar(4) and not some numeric type? – Evk Nov 06 '20 at 20:31
  • @Evk Why would year be a numeric type? I'm not doing any mathematical calculations on it. Is it more efficient, storage-wise? – B. Clay Shannon-B. Crow Raven Nov 06 '20 at 21:00
  • 1
    Right on that screenshot you are doing a between query for year, where you assume numerical comparision. – Evk Nov 06 '20 at 21:08
  • @Evk: True, but I convert the vals to ints while doing that comparison. I guess you're right; I should have made them ints from the git-go. – B. Clay Shannon-B. Crow Raven Nov 06 '20 at 21:31
  • https://stackoverflow.com/q/5071601/1506454 – ASh Nov 08 '20 at 19:06
  • 1
    @B.ClayShannon could you please add 3 or 4 lines of sample data and your expected result set for some parameter values. That would help me to both validate your query and suggest one. – Raihan Nov 10 '20 at 10:54

4 Answers4

2

Using SQL Server CHARINDEX:

SELECT * 
FROM MOVIES_SINGLETABLE 
WHERE IMDBRating >= 7.5 AND YearReleased BETWEEN '1980' AND '2020' 
AND (
    CHARINDEX ('Adventure',genres) > 0
 OR CHARINDEX ('Family',genres) > 0
 OR CHARINDEX ( 'Fantasy',genres)  > 0
)
AND (
    CHARINDEX ('G', MPAARating) > 0
 OR CHARINDEX ('PG', MPAARating) > 0
 OR CHARINDEX ('PG-13', MPAARating) > 0
)
ORDER BY IMDBRating DESC, YearReleased DESC

Or SQLite INSTR:

SELECT * 
FROM MOVIES_SINGLETABLE 
WHERE IMDBRating >= 7.5 AND YearReleased BETWEEN '1980' AND '2020' 
  AND (
      INSTR (genres, 'Adventure') > 0
   OR INSTR (genres,'Family') > 0
   OR INSTR (genres, 'Fantasy') > 0
  )
  AND (
      INSTR (MPAARating, 'G') > 0
   OR INSTR (MPAARating, 'PG') > 0
   OR INSTR (MPAARating, 'PG-13') > 0
  )
ORDER BY IMDBRating DESC, YearReleased DESC;

db<>fiddle demo

Notes:

  1. Added parentheses around OR condition
  2. Storing data in CSV format is not the best design(column does not contain atomic value).
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
1

That's bad design indeed. Your first effort should go into fixing it: each element in the CSV list should be stored in a separate row. See: Is storing a delimited list in a database column really that bad?

One workaround would be to use a series of like conditions, like so:

and (
       ',' || mpaarating || ',' like '%,' || {0} || ',%'
    or ',' || mpaarating || ',' like '%,' || {1} || ',%'
    or ',' || mpaarating || ',' like '%,' || {2} || ',%'
)
GMB
  • 216,147
  • 25
  • 84
  • 135
1

This might be what you are looking for

DECLARE @TempTable TABLE (FilmName VARCHAR(128), GENRE VARCHAR(128))
INSERT INTO @TempTable VALUES ('Film 1', '1,2,5')
INSERT INTO @TempTable VALUES ('Film 2', '1,3,4')
INSERT INTO @TempTable VALUES ('Film 3', '6')

DECLARE @SearchGenre VARCHAR(128) = '3,4,2'
DECLARE @SearchGenreT TABLE (gens VARCHAR(8))
INSERT INTO @SearchGenreT SELECT * FROM STRING_SPLIT(@SearchGenre, ',')

SELECT * FROM @TempTable
WHERE 
(
SELECT COUNT(a.value) from STRING_SPLIT(GENRE, ',') a
JOIN /*STRING_SPLIT(@SearchGenre, ',')*/ @SearchGenreT b ON a.value = b.gens
) > 0
Muhab
  • 367
  • 1
  • 2
  • 13
1

Yes, it is possible to work within your limitations. It's not pretty, but it is functional. All of the below is within SQL Server T-SQL syntax.

  1. Find or create a function that can split your comma-delimited values into tables

  2. Run the genres column through your function in order to split your list of genres down into a miniature tables of individual values; we do this with CROSS APPLY

  3. In your WHERE clause, run the user's preferences through your function to split that into individual values; see the code for more clarity.

CREATE TABLE MOVIES_SINGLETABLE (
    MovieID nvarchar(10) NOT NULL,
    MovieTitle nvarchar(100) NOT NULL,
    Genres nvarchar(100) null
)
GO

INSERT INTO MOVIES_SINGLETABLE VALUES ('M001', 'The Princess Bride', 'Fantasy, Action, Comedy')
INSERT INTO MOVIES_SINGLETABLE VALUES ('M002', 'Die Hard', 'Action')
INSERT INTO MOVIES_SINGLETABLE VALUES ('M003', 'Elf', 'Christmas, Holiday, Comedy')
INSERT INTO MOVIES_SINGLETABLE VALUES ('M004', 'Percy Jackson and the Lightning-Thief', 'Fantasy')
go

DECLARE @genreList varchar(150) = 'Comedy, Action'

-- IN SQL 2016 onward
SELECT DISTINCT MovieTitle, Genres 
FROM MOVIES_SINGLETABLE m
CROSS APPLY STRING_SPLIT(m.genres, ',') mgenres
WHERE TRIM(mgenres.value) IN (SELECT TRIM(value) FROM string_split(@genreList, ','))


-- Before 2016, using a function dbo.stringSplit we create beforehand
-- notice the syntax is nearly identical
SELECT DISTINCT MovieTitle, Genres 
FROM MOVIES_SINGLETABLE m
CROSS APPLY dbo.stringSplit(m.genres, ',') mgenres
WHERE LTRIM(RTRIM((mgenres.value))) IN (SELECT LTRIM(RTRIM(value)) FROM dbo.stringSplit(@genreList, ','))

SQL Server 2016 onward has the functionality built in, and appears to exist in the Express license. However, here is the dbo.stringSplit function code I used, and the source if you want different variations of it:

/* SOURCE: https://stackoverflow.com/a/19935594/14443733 */
CREATE FUNCTION dbo.stringSplit (
    @list NVARCHAR(max),
    @delimiter NVARCHAR(255)
    )
RETURNS TABLE
AS
RETURN (
        SELECT [Value]
        FROM (
            SELECT [Value] = LTRIM(RTRIM(SUBSTRING(@List, [Number], CHARINDEX(@delimiter, @List + @delimiter, [Number]) - [Number])))
            FROM (
                SELECT Number = ROW_NUMBER() OVER (
                        ORDER BY name
                        )
                FROM sys.all_columns
                ) AS x
            WHERE Number <= LEN(@List)
                AND SUBSTRING(@delimiter + @List, [Number], DATALENGTH(@delimiter) / 2) = @delimiter
            ) AS y
        );
GO
CoffeeNeedCoffee
  • 1,554
  • 3
  • 13