Behavior of Selecting Non-Existent Columns

Question

Given a table with this structure:

create table person (
  id int identity primary key,
  name varchar(256),
  birthday datetime
)

For this query:

select id,name,birthday,haircolor from person

What is the rational behind SQL throwing an error in this situation? I was considering how flexible it would be if queries would simply return null for non existent columns. What would be concrete reasons for or against such a SQL language design?

Why? With the error message, you know what the issue is. If the database engine gave you a column called haircolor with all NULL records in the results, you wouldn't know if that column actually exists in the database or not. Perhaps haircolor does exist, but simply contains all NULLs. — Ellesedil, Aug 19 '14 at 18:56
Also, there's no definitive answer to this question as any answer would be subjective to a person's personal preference. — Ellesedil, Aug 19 '14 at 18:58
I don't consider the question particularly subjective. Values for a non-existent column are undefined, and null is undefined, so it makes sense to me for the database to just say that, and allow for queries to better deal with different database versions. — quillbreaker, Aug 19 '14 at 19:01
I don't agree with your logic, but I agree it is not a subjective question. There are very concrete reasons why silent errors are not allowed. One can look at a wide variety of scenarios and show evidence of why such an approach would be very bad. — AaronLS, Aug 19 '14 at 19:03
@quillbreaker - though I understand why you'd consider this, you are ultimately asking for 'error suppresion'. IE, please do the wrong thing instead of return an error. If MySQL has taught me anything, it's that is the worst possible idea for a code engine...ever. — Twelfth, Aug 19 '14 at 19:06
@Twelfth - I see your point. I work in extractions a lot recently, where I have to do ETL against lots of other people's databases, some of which are bad in various ways. We were discussing this at work and I was curious if the collective wisdom of stack overflow would enlighten me on the downsides of this, were it to be implemented. — quillbreaker, Aug 19 '14 at 19:09
Why the downvotes? How is this question different from this? They are both asking about what are essentially silent errors. Just because you perhaps don't like the premise of the question doesn't make it bad. It still is a good question about the reason languages are designed with compile/parse time errors. Just like this other question is about why we don't allow silent return errors and force errors threw runtime exceptions: http://stackoverflow.com/questions/99683/which-and-why-do-you-prefer-exceptions-or-return-codes — AaronLS, Aug 19 '14 at 19:21
@AaronLS the question in the link you referenced asks the difference in avenues a developer can take when returning values to the parent, thus being able to make a decision that best suits his needs. This question on the other hand is asking why something does not do something in the way the poster prefers/expects. Both very different. It's like one is talking about whether he should use a car or the public transport, and the other about why life doesn't work in the way he prefers/expects it to. :) — Pavan, Aug 19 '14 at 19:30
@Pavan They both ask "Why this way?" Read the title of the other. It asks two questions "why do you prefer Exceptions or Return codes?" That is more subjective because it is asking about preferences. This is question is asking about the design of the language and in that respect is more concrete. It is phrased poorly such that there is a premise to the question. Someone could have easily asked that other question with a poor premise of "Why don't we use return values to communicate errors?" and the top answer would have still been a valid answer. Doesn't change the answerability. — AaronLS, Aug 19 '14 at 19:35
@Pavan I would certainly hope that anyone considering this approach would find this question and read the concrete evidence of why it is a bad approach, before they proceed. — AaronLS, Aug 19 '14 at 19:37
@AaronLS - any advice on how to reword the last paragraph of the question? I agree that the wording is poor. — quillbreaker, Aug 19 '14 at 19:40
@quillbreaker, is your main reason for asking this question to find out why the output cannot just be returned as nil? I feel you already have your answer to your question quil. Let us know if it is actually something else that youre after. — Pavan, Aug 19 '14 at 19:47
@quillbreaker Remove the "Why doesn't" and "Wouldn't it be better" from title and maybe title of "Behavior of Selecting non-existant columns" and in body in bold the question "What is the rational behind SQL throwing an error in this situation? I was considering how flexible it would be if queries would simply return null for non existent columns. What would be concrete reasons for or against such a SQL language design?" — AaronLS, Aug 19 '14 at 19:47
@AaronLS while your rewording is much better, I feel we're moving away from his original question. We all know what his original question was and the premise underwhich he asked it in the first place, to do anything else after having received all the answers, would just be going around in circles. Lets move onto the next question now that he's gotten the answer to his original question that he had in mind when writing this post. He wasnted interested in the reasons for `for or against such a SQL language design`. Correct me if im wrong @quillbreaker. — Pavan, Aug 19 '14 at 19:50
@AaronLS - thank you, i couldn't manage that degree of eloquence today. — quillbreaker, Aug 19 '14 at 19:50
@Pavan - Primary goal : I want to understand why it is better for sql engines to do it the way they do it now. Secondary Goal : Towards the primary goal, I don't want the question to get closed. — quillbreaker, Aug 19 '14 at 19:51
@quillbreaker, ah fair enough :D. I guess it was just a touch of 'eloquence' that was missing from the question. No problem, if you were really interested in a for and against, then no worries, we can give you that! :D — Pavan, Aug 19 '14 at 19:51

Pavan · Answer 1 · 2014-08-19T19:26:23.157

No, because then you'd assume there is a haircolour column which could then have the following implications, just as an example:

You'd think you could insert into the field since you would assume that the column exists,
The returned result would not be an accurate representation of the database schema.

Errors are given so that you can have a clear understanding and indication of what you can and cannot do. It's also there so you can catch exceptions, bugs, spiders, and of course creepy crawlies as soon as possible. :)

We dont wan't to accomdate lazy developers.

Do the right thing, be a man - Russel Peters.

score 3 · Answer 2 · answered Aug 19 '14 at 19:02

This would be inconsistent for many reasons. One simple example is use of * to select the column: when you do this

select id, name, birthday, hair_color from person

and get back a table with all four columns present (hair_color is set to null on all rows) you would reasonably expect that the query below

select * from person

returned at least four columns, and that hair_color is among the columns returned. However, this wouldn't be the case if SQL allowed non-existent columns to return nulls.

This would also create hard-to-find errors when a column gets renamed in the schema, but some of the queries or stored procedures do not get re-worked to match the new names.

Generally, though, SQL engine developers make tradeoffs between usability and "hard" error checking. Sometimes, they would relax a constraint or two, to accommodate some of the most common "simple" mistakes (such as forgetting to use an aggregate function in a grouped query; it is an error, but MySql lets you get away with it). When it comes to schema checks, though, SQL engines do not allow any complacency, treating all missing schema elements as errors.

score 1 · Answer 3 · answered Aug 19 '14 at 19:00

MySQL has one horrible bug in it...

select field1,field2,filed3
from table 
group by field1

Any database engine would return 'error, wtf do I do with the field2 and field3 in the select line when they are not an aggregate nor in the group by statement'.

MySQL on the other hand will return two random values for field 2 and 3, and not return an error (which I refer to as 'doing the wrong thing and not returning an error'). THis is a horrid bug, the number of scripts of troubleshooted to discover theat MySQL is not handling group by's correctly is absurd...give an error before giving unintentional results and this huge bug won't be such an issue

Doesn't it seem to you that you are requesting more of this stupid behaviour to be propagated...just in the select clause instead of the group by clause?

edit:

typo propagation as well.

select age,gender, haricolour from...

I'd prefer to get an error back saying 'great typo silly' instead of a misnamed field full of nulls.

score 1 · Answer 4 · answered Aug 19 '14 at 19:02

This would result in a silent errors problem. The whole reason you have compile time errors and runtime exceptions is to catch bugs as soon as possible.

If you had a typo in a column name and it returned null instead of an error, your application will continue to work, but various things would misbehave as a result. You might have a column that saves a user setting indicating they don't want to receive emails. However your query has a typo and so it always returns null. Gradually a few people begin to report that they are setting the "Do not send emails" setting but are still getting emails. You have to hunt through all your code to figure out the cause. First you look at the form that edits this setting, then at the code that calls the database to save the setting, then you verify the data exists and the setting is getting persisted, then you look at the system that sends emails, and work your way up to the DB layer there that retrieves settings and painstakingly look through the SQL for that typo.

How much easier would that process be if it had just thrown an error in the first place? No users frustrated. No wasting time with support requests. No wasting time troubleshooting.

score 0 · Answer 5 · answered Aug 19 '14 at 18:57

Returning null would be too generic. I like it this way much better. In your flow you can probably catch errors and return null if you want. But giving more info about why the query failed is better than receiving a null (and probably have you scratch your head on your JOIN and WHERE clauses

Behavior of Selecting Non-Existent Columns

5 Answers5