What are the most common SQL anti-patterns?

Question

All of us who work with relational databases have learned (or are learning) that SQL is different. Eliciting the desired results, and doing so efficiently, involves a tedious process partly characterized by learning unfamiliar paradigms, and finding out that some of our most familiar programming patterns don't work here. What are the common antipatterns you've seen (or yourself committed)?

Just to support [Point 1 of David's answer](https://stackoverflow.com/questions/346659/what-are-the-most-common-sql-anti-patterns#346679) check out [this example of situation when using select * may produce some unexpected results](https://stackoverflow.com/questions/321468/select-from-table-vs-select-cola-colb-etc-from-table-interesting-behaviour-in-s) — kristof, Jan 14 '09 at 16:49

score 167 · Accepted Answer · edited Oct 07 '14 at 19:33

167

I am consistently disappointed by most programmers' tendency to mix their UI-logic in the data access layer:

SELECT
    FirstName + ' ' + LastName as "Full Name",
    case UserRole
        when 2 then "Admin"
        when 1 then "Moderator"
        else "User"
    end as "User's Role",
    case SignedIn
        when 0 then "Logged in"
        else "Logged out"
    end as "User signed in?",
    Convert(varchar(100), LastSignOn, 101) as "Last Sign On",
    DateDiff('d', LastSignOn, getDate()) as "Days since last sign on",
    AddrLine1 + ' ' + AddrLine2 + ' ' + AddrLine3 + ' ' +
        City + ', ' + State + ' ' + Zip as "Address",
    'XXX-XX-' + Substring(
        Convert(varchar(9), SSN), 6, 4) as "Social Security #"
FROM Users

Normally, programmers do this because they intend to bind their dataset directly to a grid, and its just convenient to have SQL Server format server-side than format on the client.

Queries like the one shown above are extremely brittle because they tightly couple the data layer to the UI layer. On top of that, this style of programming thoroughly prevents stored procedures from being reusable.

edited Oct 07 '14 at 19:33

Chris Schiffhauer

17,102
15
79
88

answered Dec 06 '08 at 21:55

Juliet

80,494
45
196
228

13

A good poster-child pattern for maximum coupling across the largest possible number of tiers/abstraction layers. – dkretz Dec 06 '08 at 22:33
3

It may not be good for de-coupling, though for performance reasons I've done stuff like that often, iterative changes done by SQL Server are faster than done by code in mid-tier. I don't get you reusability point - nothing stops you from running the SP and renaming the cols if so you wish. – Joe Pineda Dec 07 '08 at 06:17
57

My favorite is when people embed HTML AND javascript, e.g. SELECT '' + name ' ' – Matt Rogish Jan 14 '09 at 17:19
16

With queries like this, you can edit the grid in a website with a simple alter statement. Or change the content of an export, or reformat a date in a report. This makes clients happy, and saves me time. So thanks, but no thanks, I'll stick with queries like this. – Andomar May 18 '09 at 15:13
4

@Matt Rogish - jesus, someone actually does that? – Axarydax Jan 16 '11 at 19:10
2

Yes, someone actually does that. I've seen that commonly for MSAccess Apps that use MSSQLServer as backend, because you need to tie the query directly to the continuous forms. – Oliver May 17 '13 at 10:50

Amy B · Answer 2 · 2020-07-07T15:06:26.130

121

Here are my top 3.

Number 1. Failure to specify a field list. (Edit: to prevent confusion: this is a production code rule. It doesn't apply to one-off analysis scripts - unless I'm the author.)

SELECT *
Insert Into blah SELECT *

should be

SELECT fieldlist
Insert Into blah (fieldlist) SELECT fieldlist

Number 2. Using a cursor and while loop, when a while loop with a loop variable will do.

DECLARE @LoopVar int

SET @LoopVar = (SELECT MIN(TheKey) FROM TheTable)
WHILE @LoopVar is not null
BEGIN
  -- Do Stuff with current value of @LoopVar
  ...
  --Ok, done, now get the next value
  SET @LoopVar = (SELECT MIN(TheKey) FROM TheTable
    WHERE @LoopVar < TheKey)
END

Number 3. DateLogic through string types.

--Trim the time
Convert(Convert(theDate, varchar(10), 121), datetime)

Should be

--Trim the time
DateAdd(dd, DateDiff(dd, 0, theDate), 0)

I've seen a recent spike of "One query is better than two, amiright?"

SELECT *
FROM blah
WHERE (blah.Name = @name OR @name is null)
  AND (blah.Purpose = @Purpose OR @Purpose is null)

This query requires two or three different execution plans depending on the values of the parameters. Only one execution plan is generated and stuck into the cache for this SQL text. That plan will be used regardless of the value of the parameters. This results in intermittent poor performance. It is much better to write two queries (one query per intended execution plan).

edited Jul 07 '20 at 15:06

answered Dec 06 '08 at 19:54

Amy B

108,202
21
135
185

7

hmmm, I'll give you a +1 for points 2 and 3 alone, but developers overplay rule 1. It has it's place sometimes. – annakata Dec 06 '08 at 20:01
1

What is the reasoning behind #1? – jalf Dec 06 '08 at 20:05
30

When you use select *, you get whatever is in the table. Those columns may change names and order. Client code frequently relies on names and order. Every 6 months I'm asked how to preserve column order when modifying a table. If the rule was followed it wouldn't matter. – Amy B Dec 06 '08 at 20:11
I've used #2 sometimes, others I've gone the cursor route (though then I first save the results of the query on a table var, open the cursor on that). I've always wondered if someone has done a performance test of both. – Joe Pineda Dec 07 '08 at 06:13
@Joe, A cursor can allow parallelism, while a "loop over keys" or a table var can't. In that scenario, the cursor wins for performance. Here's another stackoverflow article about cursors: http://stackoverflow.com/questions/172526/sql-cursorsany-use-cases-you-would-defend#173216 – Amy B Dec 07 '08 at 16:23
another side effect of using "select *" approach described in point 1: http://stackoverflow.com/questions/321468/select-from-table-vs-select-cola-colb-etc-from-table-interesting-behaviour-in-s – kristof Apr 20 '09 at 22:40
@Cervo of course (well defined) cursors outperform loops. Many sql devs do not define cursors well. Here are my thoughts on that: http://stackoverflow.com/a/173216/8155 – Amy B Jan 23 '13 at 15:13
5

...but of course cursors should almost always be a last resort, after failure to figure out how to do the job with set-based SQL. I once spent about 45 minutes carefully dissecting a horrendous, gigantic PL/SQL cursor in a stored procedure (drew diagrams of the rotten thing), which populated a big temp table then selected the contents of the temp table back to the caller to render a report. It took 8.5 minutes to run, on substantial hardware. After diagramming the whole thing, I was able to replace it with a single query which returned the same results in under 2 seconds. Cursors, man... – Craig Tullis Oct 09 '14 at 03:44
Using @variables is an anti-pattern, since standard SQL doesn't know variables. – Holger Jakobs Oct 07 '20 at 09:51
How would you write the one query is better than two example and get the same behavior? Union all or temp table constructed based on some ifs? – Paul-Sebastian Manole Jul 04 '22 at 16:16

annakata · Answer 3 · 2008-12-06T20:28:15.780

76

Human readable password fields, egad. Self explanatory.
Using LIKE against indexed columns, and I'm almost tempted to just say LIKE in general.
Recycling SQL-generated PK values.
Surprise nobody mentioned the god-table yet. Nothing says "organic" like 100 columns of bit flags, large strings and integers.
Then there's the "I miss .ini files" pattern: storing CSVs, pipe delimited strings or other parse required data in large text fields.
And for MS SQL server the use of cursors at all. There's a better way to do any given cursor task.

Edited because there's so many!

edited Dec 06 '08 at 20:28

answered Dec 06 '08 at 20:11

annakata

74,572
17
113
180

1

i dont understand the LIKE argument, I never used LIKE because I was bored, but only because they want wildcards search. In my current job every column in search is LIKED. I bet it will get optimized when its too slow, but i dont get the LIKE hate. – IAdapter Dec 06 '08 at 20:38
19

wrong about cursors, i would be hesitant about saying doing any particular thing is 100% right or 100% wrong – Shawn Dec 06 '08 at 23:41
At least in SQL Server, you can parse delimited strings faster than you can get data out of an XML column/object. So they do have their place, if you care about performance! And I can't understand your aversion to LIKE, I try hard and can't think of why stay away of LIKE. – Joe Pineda Dec 07 '08 at 06:22
Sometimes you just have to use a cursor! Unless you are willing to do your processing out of the database, like say in a specific purpose, home-brewed tiny utility... Think of a way to send a mail to a group of people, for instance, without either cursors or an external app – Joe Pineda Dec 07 '08 at 06:43
Or generating a result set with running totals. – Robert Rossney Dec 07 '08 at 09:13
4

So far every cursor defense example I've seen is using the wrong tool for the job. But if all you know is SQL, you either use it inappropriately, or you learn to write other kinds of software. – dkretz Dec 08 '08 at 02:08
I would also defend the "parseable data" inside sql (in some cases). Sometimes the "plob" (parseable large object) is a type that you don't want to deal with in sql (each has different fields, etc.)... cf: http://writeonly.wordpress.com/2008/12/05/simple-object-db-using-json-and-python-sqlite/ – Gregg Lind Dec 12 '08 at 17:47
2

LIKE can use an index (at least in Oracle). – tuinstoel Jan 01 '09 at 13:23
3

@tuinstoel: How does LIKE '%blah%' get to use an index? Indexing relies on ordering and this example searches a random middle position of a string. (Indexes order by the 1st character 1st, and so looking at the middle 4 characters gives a virtually random order...) – MatBailie Feb 04 '09 at 15:06
12

On most database servers (at least the ones I've used), LIKE can use indexes.. as long as it's a prefix-search (LIKE 'xxx%') -- that is, as long as the wildcard characters don't come first in the search string. I think you might be talking at cross-purposes here a little. – Cowan Jun 05 '10 at 07:50
10

It's like you don't like `LIKE '%LIKE'`. – Johan Apr 15 '11 at 22:40
1

I have to find one instance where a cursor was a good idea, except in some rare one-off query. – NicVerAZ Apr 07 '14 at 21:59
@NicVerAZ - the real need for cursors is in database admin where you may need to run system procs against multiple tables. That is a valid use. But generally DBAs know better than to do that against a large dataset. – HLGEM Mar 14 '16 at 20:03
Using a good RDBMS all LIKEs can use an index. Even SIMILAR TO and regexp queries can use indexes if they are created appropriately. – Holger Jakobs Oct 07 '20 at 09:53

score 65 · Answer 4 · answered Dec 06 '08 at 19:45

65

Don't have to dig deep for it: Not using prepared statements.

answered Dec 06 '08 at 19:45

stesch

7,202
6
47
62

3

Yup. Followed closely in the same context, in my experience, with "not trapping errors". – dkretz Dec 06 '08 at 22:37
2

@stesch: This is nothing compared to using views and having a variable reporting date. Views are an antipattern if you have a variable reporting date (i assume most applications have). Would add this in a separate answer, but it's closed, unfortunately. – Stefan Steiger Dec 12 '13 at 14:02

score 61 · Answer 5 · answered Dec 06 '08 at 19:59

61

Using meaningless table aliases:

from employee t1,
department t2,
job t3,
...

Makes reading a large SQL statement so much harder than it needs to be

answered Dec 06 '08 at 19:59

Tony Andrews

129,880
21
220
259

51

aliases? hell I've seen actual column names like that – annakata Dec 06 '08 at 20:03
10

terse aliases are OKAY. If you want a meaningful name then don't use an alias at all. – Joel Coehoorn Dec 06 '08 at 22:56
45

He didn't say "terse," he said "meaningless." In my book there would be nothing wrong with using e, d, and j as the aliases in the example query. – Robert Rossney Dec 07 '08 at 09:14
12

Absolutely, Robert - e, d, and j would be fine with me. – Tony Andrews Dec 07 '08 at 12:06
8

I would use emp for employee, dep for department and job for job (or maybe jb) :) – Andrei Rînea Dec 17 '08 at 02:04
2

Aliasing fields/tables (to something sensible) is very usfeul in large and well organised data warehouses. it allows you to copy and paste a query, then just change one table name; relying on the alias for all it's references. – MatBailie Feb 04 '09 at 15:07
@AndreiRinea your rules for creating aliases seem a bit unpredictable (and thus harder to think up/remember) compared to @RobertRossney’s ;-). – binki Oct 29 '13 at 16:09

score 58 · Answer 6 · 2012-08-22T12:53:45.287

58

var query = "select COUNT(*) from Users where UserName = '" 
            + tbUser.Text 
            + "' and Password = '" 
            + tbPassword.Text +"'";

Blindly trusting user input
Not using parameterized queries
Cleartext passwords

edited Aug 22 '12 at 12:53

answered Dec 06 '08 at 23:49

All of which can usefully be dealt with by using a database abstracton layer of some (any) kind. – dkretz Dec 07 '08 at 03:04
@doofledorfer: Agree, a middle tier would be definitely better in a case like this, plus providing results caching as a nice side effect. – Joe Pineda Dec 07 '08 at 06:45
Awesome example. If a dev groks how to replace that with a good solution, they are half-way to becoming a decent SQL dev. – Steve McLeod Dec 07 '08 at 08:17

score 48 · Answer 7 · edited Jan 05 '10 at 12:18

48

My bugbears are the 450 column Access tables that have been put together by the 8 year old son of the Managing Director's best friends dog groomer and the dodgy lookup table that only exists because somebody doesn't know how to normalise a datastructure properly.

Typically, this lookup table looks like this:

ID INT,
Name NVARCHAR(132),
IntValue1 INT,
IntValue2 INT,
CharValue1 NVARCHAR(255),
CharValue2 NVARCHAR(255),
Date1 DATETIME,
Date2 DATETIME

I've lost count of the number of clients I've seen who have systems that rely on abominations like this.

edited Jan 05 '10 at 12:18

Adam Paynter

46,244
33
149
164

answered Dec 06 '08 at 20:25

Pete OHanlon

9,086
2
29
28

1

Worse yet, I read that in newest version of Access that's actually supported automatically, which I fear will *encourage* more of this Value1, Value2, Value3... column fetichism – Joe Pineda Dec 07 '08 at 06:18
Wait - so the 8 year old son is the son of the dog groomer? – barrypicker Dec 05 '19 at 22:41

score 28 · Answer 8 · answered Dec 06 '08 at 22:34

The ones that I dislike the most are

Using spaces when creating tables, sprocs etc. I'm fine with CamelCase or under_scores and singular or plurals and UPPERCASE or lowercase but having to refer to a table or column [with spaces], especially if [ it is oddly spaced] (yes, I've run into this) really irritates me.
Denormalized data. A table doesn't have to be perfectly normalized, but when I run into a table of employees that has information about their current evaluation score or their primary anything, it tells me that I will probably need to make a separate table at some point and then try to keep them synced. I will normalize the data first and then if I see a place where denormalization helps, I'll consider it.
Overuse of either views or cursors. Views have a purpose, but when each table is wrapped in a view it's too much. I've had to use cursors a few times, but generally you can use other mechanisms for this.
Access. Can a program be an anti-pattern? We have SQL Server at my work, but a number of people use access due to it's availabilty, "ease of use" and "friendliness" to non-technical users. There is too much here to go into, but if you've been in a similar environment, you know.

Access is NOT a DBMS. It's a RAD environment, with a very simple database manager included. SQL Server, Oracle, et al. will *never* replace it, unless you add a VB-like language and a Crystal Reports like facility. — Joe Pineda, Dec 07 '08 at 06:31
@JoePineda Not sure what you're on about citing VB and Crystal Reports, but I'd say 12 years later and C# and SSRS definitely surpassed them greatly at this point. While unfortunately Access is still used in the wild by less-than-technical people, it's surely dead at this point. :) — J.D., Jan 24 '21 at 14:01

score 28 · Answer 9 · answered Apr 20 '09 at 22:14

28

For storing time values, only UTC timezone should be used. Local time should not be used.

answered Apr 20 '09 at 22:14

Frank Schwieterman

24,142
15
92
130

4

I've still not found a good simple solution for converting from UTC to local time for dates in the past, when daylight saving has to be considered, with varying change dates accross years and countries, as well as all exceptions within countries. So UTC doesn't save you from conversion complexity. However, it's important to have a way to know the timezone of every stored datetime. – ckarras Jun 14 '09 at 11:38
1

@CsongorHalmai Many places practice daylight savings, so time values within an hour of the time shift can be ambiguous. – Frank Schwieterman Sep 03 '17 at 17:40
2

That's certainly right for the present and the past, but for the future, especially the fairly-far future, explicit time zones are often a necessity. If you have a 30-year option that was just written and expires in 2049-09-27T17:00:00 New York time, then you can't just blindly assume that will be 21:00:00Z. The U.S. Congress might well change the DST rules. You have to keep the local time and the true time zone (America/New_York) separate. – John Cowan Sep 27 '19 at 12:13

score 26 · Answer 10 · edited Jun 05 '10 at 07:39

26

use SP as the prefix of the store procedure name because it will first search in the System procedures location rather than the custom ones.

edited Jun 05 '10 at 07:39

abatishchev

98,240
88
296
433

answered Dec 06 '08 at 19:48

Oscar Cabrero

4,168
8
29
49

1

Can also be extended to using any other common prefix for all stored procedures, making it more difficult to pick through a sorted list. – dkretz Dec 06 '08 at 22:36
7

+1 for doofledorfer comment!! I've seen this a lot, I find this idiotic and does indeed make searching for a particular SP *very* difficult!!! Also extended to "vw_" for views, "tbl_" for tables and the like, how I hate them! – Joe Pineda Dec 07 '08 at 06:24
1

The prefixes can be useful if you're scripting the objects to files (eg: for source control, deployments or migration) – Rick Jul 15 '09 at 23:59
1

Why on earth would it be useful to prefix every single stored procedure with sp *or* usp? It just makes it harder to scan the list for the one you want. – Ryan Lundy Dec 03 '09 at 20:14

score 25 · Answer 11 · answered Dec 06 '08 at 20:08

25

Overuse of temporary tables and cursors.

answered Dec 06 '08 at 20:08

Rockcoder

8,289
3
32
41

2

Good evidence that "all I know is procedural languages". – dkretz Dec 06 '08 at 22:34
2

Overuse of anything is by definition unwanted. A specific example of where using temp tables/cursors would not needed would be helpful. – Jace Rhea Feb 08 '10 at 18:39
7

Mostly I see temp tables under-used. with SQL Server often you get performance gains by doing stuff with a bunch of temp tables instead of one monolithic query. – Cervo Sep 10 '10 at 20:05

score 24 · Answer 12 · answered Feb 08 '10 at 17:16

24

Re-using a 'dead' field for something it wasn't intended for (e.g. storing user data in a 'Fax' field) - very tempting as a quick fix though!

answered Feb 08 '10 at 17:16

FruitBreak

570
1
7
19

score 24 · Answer 13 · edited May 23 '17 at 11:54

using @@IDENTITY instead of SCOPE_IDENTITY()

Quoted from this answer :

@@IDENTITY returns the last identity value generated for any table in the current session, across all scopes. You need to be careful here, since it's across scopes. You could get a value from a trigger, instead of your current statement.
SCOPE_IDENTITY returns the last identity value generated for any table in the current session and the current scope. Generally what you want to use.
IDENT_CURRENT returns the last identity value generated for a specific table in any session and any scope. This lets you specify which table you want the value from, in case the two above aren't quite what you need (very rare). You could use this if you want to get the current IDENTITY value for a table that you have not inserted a record into.

+1 very true, could cause a bugs that would be tough to weed out — Axarydax, Jan 16 '11 at 19:17
Using @@ in SQL is already an anti-pattern, because ANSI/ISO SQL doesn't know variables. — Holger Jakobs, Oct 07 '20 at 09:47

score 21 · Answer 14 · answered Dec 06 '08 at 22:25

21

select some_column, ...
from some_table
group by some_column

and assuming that the result will be sorted by some_column. I've seen this a bit with Sybase where the assumption holds (for now).

answered Dec 06 '08 at 22:25

Adrian Pronk

13,486
7
36
60

2

upvote for EVER assuming sort order, just because that was the way it showed up in the query tool that one time – Joel Coehoorn Dec 06 '08 at 23:00
3

I've even seen this reported as a bug more than once. – dkretz Dec 07 '08 at 03:02
6

in MySQL, it is documented to sort. . So blame MySQL (again). – derobert Dec 07 '08 at 06:09
1

In Oracle, the unsorted results (almost) always matched the grouping - until version 10G. Lots of rework for the developers who used to leave out the ORDER BY! – Tony Andrews Jan 14 '09 at 17:05
1

I was even in a training class where this was stated as a fact for SQL Server. I had to protest really loud. For just saving to type 20 characters you rely on obscure or undocumented behavior. – erikkallen Dec 16 '09 at 21:42
I would not call this an antiPATTERN. – Rok Kralj Oct 11 '13 at 21:34

score 21 · Answer 15 · edited Jun 06 '14 at 08:00

21

SELECT FirstName + ' ' + LastName as "Full Name", case UserRole when 2 then "Admin" when 1 then "Moderator" else "User" end as "User's Role", case SignedIn when 0 then "Logged in" else "Logged out" end as "User signed in?", Convert(varchar(100), LastSignOn, 101) as "Last Sign On", DateDiff('d', LastSignOn, getDate()) as "Days since last sign on", AddrLine1 + ' ' + AddrLine2 + ' ' + AddrLine3 + ' ' + City + ', ' + State + ' ' + Zip as "Address", 'XXX-XX-' + Substring(Convert(varchar(9), SSN), 6, 4) as "Social Security #" FROM Users

Or, cramming everything into one line.

edited Jun 06 '14 at 08:00

Radu

2,022
3
17
28

answered Apr 19 '09 at 03:53

Jasper Bekkers

6,711
32
46

Used a previous comment's query, just because that was the first SQL-statement I had available. – Jasper Bekkers Apr 19 '09 at 03:54

score 18 · Answer 16 · answered Dec 06 '08 at 23:03

18

The FROM TableA, TableB WHERE syntax for JOINS rather than FROM TableA INNER JOIN TableB ON
Making assumptions that a query will be returned sorted a certain way without putting an ORDER BY clause in, just because that was the way it showed up during testing in the query tool.

answered Dec 06 '08 at 23:03

Joel Coehoorn

399,467
113
570
794

6

My Oracle DBAs always complain that I use "ANSI joins", that is, what you present as the correct way. But I keep doing it, and I suspect that deep down they know its better. – Steve McLeod Dec 07 '08 at 08:19
1

I suspect that Oracle wishes standard SQL would go away. :-) Also, you can't mix implicit and explicit JOINS (aka ANSI JOINs) in MySQL 5 - it doesn't work. Which is another argument for explicit JIONs. – staticsan Dec 08 '08 at 00:42
3

I would say that even A INNER JOIN B ON is an anti pattern. I prefer A INNER JOIN B USING. – John Nilsson Mar 10 '09 at 21:39
Oracle supports ANSI syntax now, but they used to have this really weird syntax for outer joins in the past and there are too many people still using it. – Cervo Sep 10 '10 at 20:06
well...Oracle still won't let you use ANSI joins for [Fast Refreshable, On-Commit Materialized Views](https://asktom.oracle.com/pls/apex/f?p=100:11:::NO:RP:P11_QUESTION_ID:9533445800346654016) – Gerrat May 17 '18 at 18:12
Using the words INNER and OUTER is an anti-pattern, since they don't change a iota of the query and are just noise. – Holger Jakobs Oct 07 '20 at 09:49
@HolgerJakobs Agree on "OUTER" (just use "LEFT" or "RIGHT"), but I've worked with a DB where "JOIN" by itself was ambiguous, so now I always specify "INNER" for clarity and never just JOIN by itself. – Joel Coehoorn Oct 07 '20 at 13:02
@JoelCoehoorn Interesting. Would you mind telling which DBMS it was? Of course JOIN needs NATURAL, ON or USING, but I have never encountered a DBMS needing INNER as well. – Holger Jakobs Oct 08 '20 at 20:53
@HolgerJakobs This is waaaay back. Some pre-GUI system that ran on Netware. And as I think back, the problem wasn't that it required "INNER", but that JOIN by itself defaulted to "LEFT OUTER". Therefore we *always* specified INNER for both that and the (very old) Sql Server 2000 server we used for another system. – Joel Coehoorn Oct 08 '20 at 22:25

Brian · Answer 17 · 2010-02-08T17:02:20.457

Learning SQL in the first six months of their career and never learning anything else for the next 10 years. In particular not learning or effectively using windowing/analytical SQL features. In particular the use of over() and partition by.

Window functions, like aggregate functions, perform an aggregation on a defined set (a group) of rows, but rather than returning one value per group, window functions can return multiple values for each group.

See O'Reilly SQL Cookbook Appendix A for a nice overview of windowing functions.

dkretz · Answer 18 · 2009-04-19T03:20:51.350

15

I need to put my own current favorite here, just to make the list complete. My favorite antipattern is not testing your queries.

This applies when:

Your query involves more than one table.
You think you have an optimal design for a query, but don't bother to test your assumptions.
You accept the first query that works, with no clue about whether it's even close to optimized.

And any tests run against atypical or insufficient data don't count. If it's a stored procedure, put the test statement into a comment and save it, with the results. Otherwise, put it into a comment in the code with the results.

edited Apr 19 '09 at 03:20

answered Dec 07 '08 at 02:57

dkretz

37,399
13
80
138

A very useful technique for minimal T-SQL test: In the .SQL file where you define your SP, UDF, etc., immediately after it create a block test like IF 1=2 BEGIN (sample cases for your code, with expected results as comments) END – Joe Pineda Dec 07 '08 at 06:34
SQL Server does parse the code within the test block, even though it's never executed. So when your object gets modified and receives more parameters, or of different type, etc. or an objects it depends on is modified you'll receive an error just by asking for an execution plan! – Joe Pineda Dec 07 '08 at 06:37
It's not always possible to test with real data. Often the dev server/"test" server is underpaid and gets a fraction of the live server. Generally tests are frowned on against the live server. Some places are better and have a test or staging server with live data. – Cervo Sep 10 '10 at 20:08

score 12 · Answer 19 · edited Aug 16 '10 at 17:18

12

Temporary Table abuse.

Specifically this sort of thing:

SELECT personid, firstname, lastname, age
INTO #tmpPeople
FROM People
WHERE lastname like 's%'

DELETE FROM #tmpPeople
WHERE firstname = 'John'

DELETE FROM #tmpPeople
WHERE firstname = 'Jon'

DELETE FROM #tmpPeople
WHERE age > 35

UPDATE People
SET firstname = 'Fred'
WHERE personid IN (SELECT personid from #tmpPeople)

Don't build a temporary table from a query, only to delete the rows you don't need.

And yes, I have seen pages of code in this form in production DBs.

edited Aug 16 '10 at 17:18

Tom H

46,766
14
87
128

answered Apr 20 '09 at 22:05

geofftnz

9,954
2
42
50

1

+1, I agree. Although, I have found at least one or two cases where this technique has improved performance - the queries involved were complex to say the least. – a'r Aug 30 '10 at 10:45
1

True - they do have a place, just not in every query :) – geofftnz Aug 30 '10 at 22:14
1

Sometimes you have to do that if the conditions are super complicated. True it can be abused to extremes. But many times a simple delete is much simpler than the logic for getting the case in the initial query. Also sometimes if the clause is not sargeable the initial query will slow down. But just doing it on the smaller temp table is more efficient. And other times you keep adding cases that business people keep adding after the fact. – Cervo Sep 10 '10 at 20:13

score 10 · Answer 20 · answered Jan 30 '09 at 07:02

I just put this one together, based on some of the SQL responses here on SO.

It is a serious antipattern to think that triggers are to databases as event handlers are to OOP. There's this perception that just any old logic can be put into triggers, to be fired off when a transaction (event) happens on a table.

Not true. One of the big differences are that triggers are synchronous - with a vengeance, because they are synchronous on a set operation, not on a row operation. On the OOP side, exactly the opposite - events are an efficient way to implement asynchronous transactions.

score 8 · Answer 21 · answered Dec 12 '08 at 17:55

8

Contrarian view: over-obsession with normalization.

Most SQL/RBDBs systems give one lots of features (transactions, replication) that are quite useful, even with unnormalized data. Disk space is cheap, and sometimes it can be simpler (easier code, faster development time) to manipulate / filter / search fetched data, than it is to write up 1NF schema, and deal with all the hassles therein (complex joins, nasty subselects, etc).

I have found the over-normalized systems are often premature optimization, especially during early development stages.

(more thoughts on it... http://writeonly.wordpress.com/2008/12/05/simple-object-db-using-json-and-python-sqlite/)

answered Dec 12 '08 at 17:55

Gregg Lind

20,690
15
67
81

24

I think non-normalization is often premature optimization. – tuinstoel Jan 01 '09 at 13:17
Sometimes it is, sometimes it isn't. Luckily, it's often easy to test, and different options work with different db needs. – Gregg Lind Jan 05 '09 at 00:26
19

Normalization is not just for disk space savings. It is also to create an authoritative source for the data. If the data is stored only one place, then consistency is not a byproduct of careful coding, but is instead a byproduct of design. – Grant Johnson Jan 12 '12 at 21:27
Storing compound data in JSON format is one thing: there is more and more support for it, and it's a conscious tradeoff. Using comma-separated (or whatever) values in an attempt to save one join is penny-wise and pound-foolish. – John Cowan Sep 27 '19 at 12:20
noSQL solutions are showing a performance benefit at the cost of duplicate data by eliminating multi-table lookups. Puts the whole normalization thing on its head. In some examples the data is collected in multiple places to ensure one process has the fastest response time possible. Of course, questions about authoritative sources come in to play. – barrypicker Dec 05 '19 at 22:47
Normalization is for consistency in a transactional system, basically a way to better represent an object graph in storage and preserve a single source of truth. De-normalization is for IO throughput, and mostly analytics workloads. They both serve a different purpose and provide a tradeoff between speed and consistency. – Paul-Sebastian Manole Jul 04 '22 at 22:49

score 8 · Answer 22 · answered Oct 12 '11 at 12:02

8

Stored Procedures or Functions without any comments...

answered Oct 12 '11 at 12:02

Bliek

466
3
7

And views ;) Functions true, except table-valued functions (=views with parameters). – Stefan Steiger Dec 12 '13 at 14:22
You mean documentation, not any comments. – Paul-Sebastian Manole Jul 04 '22 at 21:46

score 7 · Answer 23 · answered Dec 06 '08 at 20:05

7

1) I don't know it's an "official" anti-pattern, but I dislike and try to avoid string literals as magic values in a database column.

An example from MediaWiki's table 'image':

img_media_type ENUM("UNKNOWN", "BITMAP", "DRAWING", "AUDIO", "VIDEO", 
    "MULTIMEDIA", "OFFICE", "TEXT", "EXECUTABLE", "ARCHIVE") default NULL,
img_major_mime ENUM("unknown", "application", "audio", "image", "text", 
    "video", "message", "model", "multipart") NOT NULL default "unknown",

(I just notice different casing, another thing to avoid)

I design such cases as int lookups into tables ImageMediaType and ImageMajorMime with int primary keys.

2) date/string conversion that relies on specific NLS settings

CONVERT(NVARCHAR, GETDATE())

without format identifier

answered Dec 06 '08 at 20:05

devio

36,858
7
80
143

And no syntactical indentation, either. Argghh. – dkretz Dec 07 '08 at 03:01
2

Why is this bad? surely if you are trying to express a set of values this works just as well as a lookup table, and fits better with code that calls it. Id rather have an enum in my app code that maps to an enum constraint in my DB than an enum in my app code that maps to specific rows of a lookup table. It just feels cleaner. – Jack Ryan Feb 08 '10 at 18:10
@JackRyan: This is bad because when you change the enum list later, you need to remember to change it in two places now. It violates [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). The database should be the single source of truth. – Gerrat May 17 '18 at 18:05

score 7 · Answer 24 · answered Dec 06 '08 at 20:15

7

Identical subqueries in a query.

answered Dec 06 '08 at 20:15

EvilTeach

28,120
21
85
141

10

Unfortunately, sometimes you just can't avoid that - in SQL 2000 there was no "WITH" keyword, and using UDFs to encapsulate common subqueries sometime leads to performance penalties, blame MS on that... – Joe Pineda Dec 07 '08 at 06:25
Well, hopefully they will get around to adding it one of these days. – EvilTeach Dec 09 '08 at 02:24
In SQL 2000, you can use table variables. – recursive Dec 30 '08 at 04:33
@recursive: you can't have indexes on a table variable, which will often make it slower than a subquery. However you could use a temporary table with custom indexes. – Rick Jul 16 '09 at 00:08
Cool, have been working with SQL for years, and didn't even know Common Table Expressions exist (though I would have needed them). Now I do! Thanks! – sleske Oct 29 '09 at 23:27

score 7 · Answer 25 · edited Jul 14 '11 at 17:06

The Altered View - A view that is altered too often and without notice or reason. The change will either be noticed at the most inappropriate time or worse be wrong and never noticed. Maybe your application will break because someone thought of a better name for that column. As a rule views should extend the usefulness of base tables while maintaining a contract with consumers. Fix problems but don't add features or worse change behavior, for that create a new view. To mitigate do not share views with other projects and, use CTEs when platforms allow. If your shop has a DBA you probably can't change views but all your views will be outdated and or useless in that case.
The !Paramed - Can a query have more than one purpose? Probably but the next person who reads it won't know until deep meditation. Even if you don't need them right now chances are you will, even if it's "just" to debug. Adding parameters lowers maintenance time and keep things DRY. If you have a where clause you should have parameters.

The case for no CASE -

SELECT  
CASE @problem  
  WHEN 'Need to replace column A with this medium to large collection of strings hanging out in my code.'  
    THEN 'Create a table for lookup and add to your from clause.'  
  WHEN 'Scrubbing values in the result set based on some business rules.'  
    THEN 'Fix the data in the database'  
  WHEN 'Formating dates or numbers.'   
    THEN 'Apply formating in the presentation layer.'  
  WHEN 'Createing a cross tab'  
    THEN 'Good, but in reporting you should probably be using cross tab, matrix or pivot templates'   
ELSE 'You probably found another case for no CASE but now I have to edit my code instead of enriching the data...' END

score 5 · Answer 26 · answered Dec 07 '08 at 04:48

5

The two I find the most, and can have a significant cost in terms of performance are:

Using cursors instead of a set based expression. I guess this one occurs frequently when the programmer is thinking procedurely.
Using correlated sub-queries, when a join to a derived table can do the job.

answered Dec 07 '08 at 04:48

Mitch Wheat

295,962
43
465
541

I agree if you mean what I think you mean; although a correlated sub-query is a type of derived table IIRC. – dkretz Dec 07 '08 at 05:08
1

A derived table is a set operation, whereas a correlated subquery runs for each row in the outer query, making it less efficient (9 times out of 10) – Mitch Wheat Dec 07 '08 at 06:07
A couple years ago I found to my surprise that SQL S. is somehow optimized for handling correlated queries: for simple ones you get the same execution plan as with a logically equivalent query using a JOIN! Also, correlated queries that bring Oracle to its knees run only slowly on SQL S.! – Joe Pineda Dec 07 '08 at 06:39
That's why I always test it both ways. And I do> usually try it both ways. In practice, for SQL Server anyway, I've usually found the correlated sq to be no slower. – dkretz Dec 07 '08 at 08:46
I cheated and googled "correlated subquery derived table". That's where I discovered two sources saying a csq is one type of derived table. (It also mentioned the common misapprehension among SQL Server users. I didn't even know enough to be confused; so Mitch is up on points.) – dkretz Dec 07 '08 at 08:49
But a csq is, in fact, a table that is derived as well - just repeatedly for each row. (Unless it's optimized away, of course, which does happen.) – dkretz Dec 07 '08 at 08:50
I'm very guilty of the 2nd, but I'm working on that. – Joel Coehoorn Dec 07 '08 at 13:38
4

PLEASE understand that a correlated subquery and a join are IDENTICAL (in most cases). They are not even different things that are optimized to one another, but just different textual representations of the same operation. – erikkallen Dec 16 '09 at 21:54
Related to Mitch's answer: our app had one operation that used a CSQ in another CSQ (the dynamic nature of this part of the app demands the setup. it really does.) When one client got to a certain size dataset, the query exploded to 4.9 BILLION reads. By unrolling the inner query, we reduced the runtime of this particular bit by 99%. – DaveE Feb 08 '10 at 17:25

score 5 · Answer 27 · answered Nov 24 '11 at 05:33

Developers who write queries without having a good idea about what makes SQL applications (both individual queries and multi-user systems) fast or slow. This includes ignorance about:

physical I/O minimization strategies, given that most queries' bottleneck is I/O not CPU
perf impact of different kinds of physical storage access (e.g. lots of sequential I/O will be faster than lots of small random I/O, although less so if your physical storage is an SSD!)
how to hand-tune a query if the DBMS produces a poor query plan
how to diagnose poor database performance, how to "debug" a slow query, and how to read a query plan (or EXPLAIN, depending on your DBMS of choice)
locking strategies to optimize throughput and avoid deadlocks in multi-user applications
importance of batching and other tricks to handle processing of data sets
table and index design to best balance space and performance (e.g. covering indexes, keeping indexes small where possible, reducing data types to minimum size needed, etc.)

score 4 · Answer 28 · answered Sep 15 '10 at 13:01

4

Using primary keys as a surrogate for record addresses and using foreign keys as a surrogate for pointers embedded in records.

answered Sep 15 '10 at 13:01

Walter Mitty

18,205
2
28
58

score 4 · Answer 29 · answered Jan 01 '09 at 13:11

4

Putting stuff in temporary tables, especially people who switch from SQL Server to Oracle have a habit of overusing temporary tables. Just use nested select statements.

answered Jan 01 '09 at 13:11

tuinstoel

7,248
27
27

Depending on the type of nested select (IN, EXISTS) they can run for every row of the parent query which is super slow. – Paul-Sebastian Manole Jul 04 '22 at 22:46

score 4 · Answer 30 · answered Nov 17 '11 at 16:02

Application Joins Not solely an SQL issue, but looking for descriptions of the problem and finding this question, I was surprised it wasn't listed.

As I've heard the phrase used, an application join, is when you pull a set of rows out of each of two or more tables and then join them in your (Java) code with a pair of nested for loops. This burdens the system (your app and the database) with having to identify the whole cross product, retrieving it and sending it to the appication. Assuming the app can filter the cross product down as fast as the database (dubious), just cutting the result set down sooner means less data transfer.

Jonathan Leffler · Answer 31 · 2008-12-08T00:22:12.187

Using SQL as a glorified ISAM (Indexed Sequential Access Method) package. In particular, nesting cursors instead of combining SQL statements into a single, albeit larger, statement. This also counts as 'abuse of the optimizer' since in fact there isn't much the optimizer can do. This can be combined with non-prepared statements for maximum inefficiency:

DECLARE c1 CURSOR FOR SELECT Col1, Col2, Col3 FROM Table1

FOREACH c1 INTO a.col1, a.col2, a.col3
    DECLARE c2 CURSOR FOR
        SELECT Item1, Item2, Item3
            FROM Table2
            WHERE Table2.Item1 = a.col2
    FOREACH c2 INTO b.item1, b.item2, b.item3
        ...process data from records a and b...
    END FOREACH
END FOREACH

The correct solution (almost always) is to combine the two SELECT statements into one:

DECLARE c1 CURSOR FOR
    SELECT Col1, Col2, Col3, Item1, Item2, Item3
        FROM Table1, Table2
        WHERE Table2.Item1 = Table1.Col2
        -- ORDER BY Table1.Col1, Table2.Item1

FOREACH c1 INTO a.col1, a.col2, a.col3, b.item1, b.item2, b.item3
    ...process data from records a and b...
END FOREACH

The only advantage to the double loop version is that you can easily spot the breaks between values in Table1 because the inner loop ends. This can be a factor in control-break reports.

Also, sorting in the application is usually a no-no.

The style, although not this syntax, is particularly rampant in PHP in my experience. — dkretz, Dec 06 '08 at 22:32
The syntax is actually IBM Informix-4GL - but it is clear enough not to need much in the way of explanation (I think). And the style is rampant in a lot of SQL programs - regardless of programming language. — Jonathan Leffler, Dec 07 '08 at 07:58
Except for the fact that you're using a well known antipattern (implicit joins) to illustrate your antipattern, kind of defeats the point. — Johan, Sep 01 '11 at 09:55
And of course the use of cursors at all is a SQl antipattern. Virtually all cursors can be rewritten as set-based operations. The few that can't are the kind that only DBAs with years of experience and who understand how the internals of the datbase work should be writing. No application dev should ever need to write a SQL cursor. — HLGEM, Jan 07 '15 at 19:00

score 3 · Answer 32 · answered May 05 '11 at 23:42

I just came across view definition like this:

CREATE OR REPLACE FORCE VIEW PRICE (PART_NUMBER, PRICE_LIST, LIST_VERSION ...)
AS
  SELECT sp.MKT_PART_NUMBER,
    sp.PRICE_LIST,
    sp.LIST_VERSION,
    sp.MIN_PRICE,
    sp.UNIT_PRICE,
    sp.MAX_PRICE,
...

There are 50 or so columns in the view. Some developers take a small pride torturing others by not providing column aliases, so one have to count column offset in both places in order to be able to figure out what column in a view corresponds to.

score 3 · Answer 33 · answered Aug 10 '11 at 07:35

3

I have seen too many people holding on for dear life to IN (...) while totally oblivious to EXISTS. For a good example, see Symfony Propel ORM.

answered Aug 10 '11 at 07:35

sayap

6,169
2
36
40

I try not to blame people for this. `IN` is fairly intuitive, except for the occasional issue with `NULL`s. On the other hand, `EXISTS` often requires understanding correlated sub-queries, which are a little more advanced. – J. Mini May 30 '22 at 18:46
This example is pretty bad. There is no actual reference or explanation why `EXISTS` should be preferred over `IN` (it shouldn't, they both have different uses and behaviors even if somewhat similar). `EXISTS` runs a sub-query for every row of the parent query to be checked. `IN` just does a comparison with static data. – Paul-Sebastian Manole Jul 04 '22 at 21:54
A simple `EXISTS` example would be `SELECT * FROM table1 WHERE col1 = x AND EXISTS (SELECT 1 FROM table2 WHERE table1.col2 = table2.col2 AND col3 = y)`, which tends to be written as `SELECT * FROM table1 WHERE col1 = x AND col2 IN (SELECT col2 FROM table2 WHERE col3 = y)` by people who are not aware of the existence of `EXISTS`. The `IN` method works for the special case of `EXISTS` where only 1 column is needed to link the tables. Past that, a google search of "sql in multiple columns" should yield https://stackoverflow.com/questions/1136380/sql-where-in-clause-multiple-columns – sayap Jul 05 '22 at 23:05
With that said, postgresql does support `IN` with multiple columns: https://www.postgresql.org/docs/current/functions-subquery.html – sayap Jul 05 '22 at 23:10

score 2 · Answer 34 · answered Aug 30 '10 at 10:27

2

re: using @@IDENTITY instead of SCOPE_IDENTITY()

you should use neither; use output instead

cf. https://connect.microsoft.com/SQLServer/feedback/details/328811/scope-identity-sometimes-returns-incorrect-value

answered Aug 30 '10 at 10:27

Denis Valeev

5,975
35
41

link is broken. Better to include the relevant text than include the link – Thronk Aug 03 '16 at 14:55

score 2 · Answer 35 · answered Jan 14 '09 at 17:17

2

Joining redundant tables into a query like this:

select emp.empno, dept.deptno
from emp
join dept on dept.deptno = emp.deptno;

answered Jan 14 '09 at 17:17

Tony Andrews

129,880
21
220
259

7

That actually does do something: It only returns the `emp` rows if the corresponding `dept` row exists. – Ariel Oct 06 '11 at 08:43
4

@Ariel: The point is, it is a [semi-join](http://en.wikipedia.org/wiki/Relational_algebra#Semijoin) but that fact is not immediately apparent. Probably better written using `emp.deptno`(rather than `dept.deptno`) in the `SELECT` clause, even better to use `SELECT DISTINCT` (rather than `SELECT ALL`). But perhaps best to use something that is more obviously a semi-join e.g. `SELECT empno, deptn FROM emp WHERE EXISTS (SELECT * FROM dept WHERE dept.deptno = emp.deptno);` – onedaywhen Oct 12 '11 at 14:33

score 2 · Answer 36 · answered Apr 04 '11 at 18:39

My favourite SQL anti-patterns:

JOIN on non-unique columns and using SELECT DISTINCT to trim the result.

Creating a view joining many tables just to select few columns from one table.

 CREATE VIEW my_view AS 
     SELECT * FROM table1
     JOIN table2 ON (...)
     JOIN table3 ON (...);

 SELECT col1, col2 FROM my_view WHERE col3 = 123;

score 2 · Answer 37 · answered May 12 '09 at 20:13

2

Having 1 table

code_1
value_1
code_2
value_2
...
code_10
value_10

Instead of having 3 tables

code, value and code_value

You never know when you may have need more than 10 couples code, value.

You don't waste disk space if you only need one couple.

answered May 12 '09 at 20:13

Luc M

16,630
26
74
89

I don't undertsand why you would need a `code_value` table? Isn't that just data duplicaiton? – Mawg says reinstate Monica Feb 21 '17 at 13:18

score 1 · Answer 38 · answered Jan 01 '09 at 13:05

1

Maybe not an anti pattern but it annoys me is when DBA's of certain DB's (ok I'm talking about Oracle here) write SQL Server code using Oracle style and code conventions and complain when it runs so bad. Enough with the cursors Oracle people! SQL is meant to be set based.

answered Jan 01 '09 at 13:05

Craig

36,306
34
114
197

5

I think this is more related to your DBA than it is to Oracle. Oracle advices people to think and act set based too instead of row by row procedural thinking with cursors. – tuinstoel Jan 01 '09 at 13:37
You are probably right tuinstoel. But we have numerous DBA's in my company and they all seem to love cursors. – Craig Jan 05 '09 at 01:47
1

Then they're not very good DBA's.... You don't happen to work in the same place as me do you? ;) – Andrew Rollings Jan 21 '09 at 19:12

score -1 · Answer 39 · answered Feb 03 '10 at 18:19

Not using the With clause or a proper join and relying on sub-queries.

Anti-Pattern:

select 
 ...
from data
where RECORD.STATE IN (
          SELECT STATEID
            FROM STATE
           WHERE NAME IN
                    ('Published to test',
                     'Approved for public',
                     'Published to public',
                     'Archived'
                    ))

Better:
I like using the with clause to make my intent more readable.

with valid_states as (
          SELECT STATEID
            FROM STATE
           WHERE NAME IN
                    ('Published to test',
                     'Approved for public',
                     'Published to public',
                     'Archived'
                    )
select  ... from data, valid_states
where data.state = valid_states.state

Best:

select 
  ... 
from data join states using (state)
where 
states.state in  ('Published to test',
                     'Approved for public',
                     'Published to public',
                     'Archived'
                    )

Why? Could you please describe the difference in behind-the-scene — abatishchev, Aug 30 '10 at 11:08
Usually called a CTE or Common Table Expression, there have got to be better examples than this, as your anti-pattern and best examples are more similar and your thesis is that better should be the pattern — Thronk, Aug 03 '16 at 15:00
Have you looked at the execution plans of these examples? Besides style, and personal appeal, there is no proof of which is faster or any other way "better". — Paul-Sebastian Manole, Jul 04 '22 at 22:09

What are the most common SQL anti-patterns?

39 Answers39

Linked