0

long time listener, first time caller here...

I am trying to use SQL to calculate the average of several values across a row. All suggestions I've seen thus far say to simply Sum the values and divide by the number of values used, which would work except that it is common for some of my values to be NULL, in which case this calculation does not work. Is there a SQL statement I can use to get the average per Identifier per date across a row of values when some of those values may be NULL?

in attached image from Excel - Excel properly captures the average and ignores the NULL values. Can SQL do the same and get the same values?

enter image description here

Tried known suggestions for summing the values and dividing by the number of values, which does not work when those values are NULL.

Ginger
  • 3
  • 3
  • What value do you want the average of 5 and null to be? (5, 2.5 or something different)? – MTilsted Aug 18 '23 at 13:22
  • If using two values and one is Null, then the average would be 5 in what I'm doing (attempting). Null should not equal zero in this equation. – Ginger Aug 18 '23 at 13:31
  • Can you share the SQL you tried and also tag with the flavor of sql system you're using? I expect SUM and COUNT should respect the null behavior that you want, so it might be helpful to see how you were doing it to get a different result – Josh Aug 18 '23 at 13:34
  • This SQL at least works to pull back a value, but in cases where the values included a NULL, the result was NULL and not the average of those columns where the value was not NULL – Ginger Aug 18 '23 at 13:46
  • SELECT IDENTIFIER, BUSINESS_DATE, SUM((VALUE_1+ VALUE_2+ VALUE_3+ VALUE_4+ VALUE_5+ VALUE_6))/6) WHERE BUSINESS_DATE = '2023-08-17' GROUP BY IDENTIFIER, BUSINESS_DATE – Ginger Aug 18 '23 at 13:47
  • this attempt didn't work because it said the AVG function requires One (1) argument: SELECT IDENTIFIER, BUSINESS_DATE, AVG(VALUE_1, VALUE_2, VALUE_3, VALUE_4, VALUE_5, VALUE_6) WHERE BUSINESS_DATE = '2023-08-17' GROUP BY IDENTIFIER, BUSINESS_DATE – Ginger Aug 18 '23 at 13:48
  • apologies for the formatting - not sure how to format in a paragraph form/create spaces between thoughts. – Ginger Aug 18 '23 at 13:48
  • Then just add "and FieldYouTryToSum is not null". Then you remove all rows with null from the result, before you sum them. – MTilsted Aug 18 '23 at 13:59
  • I don't want to remove the rows which contain a value that is NULL. It's completely normal in what I'm doing to not have gotten data for a particular value and to have a NULL in that field. therefore I need to get the average of the rows in which a value of NULL might still be present, but I cannot treat it as zero. – Ginger Aug 18 '23 at 14:21

4 Answers4

0

Do not expect SQL rows to work like Excel rows. SQL is more about all the data in a column, so if you want to sum values in a single column you're good to go. If you're working with multiple columns you're going to have to explicitly reference each of them, and if you're doing that you can easily do the math to create the result column (eg SELECT A,B,C, (A+B+C)/3 AS result FROM TABLE)

Normally you'd have a value for many records, and then use SQL built-in commands like AVG or SUM to produce the answer.

Given this, I think your best approach is to export the SQL data into Excel and operate on it there.

In SQL NULL is not the same as 0 (NULL in SQL is really a "does not exist" value, so that the average of 3 entries '5', '7', and 'NULL' will give an average of 6, not 4)

You can convert a NULL into 0 using the IsNull operator, eg IsNull(col,0))

gbjbaanb
  • 51,617
  • 12
  • 104
  • 148
  • therein lies the rub - I am pulling data into Tableau which also doesn't intuitively support an average function (plus it won't let me then perform a calculation using the calculated field). I need the report to update automatically daily without pulling into Excel. Thanks, though! – Ginger Aug 18 '23 at 14:08
  • Returning a zero for a NULL would skew my results since zero is not the same as NULL in what I'm trying to accomplish, unfortunately. – Ginger Aug 18 '23 at 14:22
  • "In SQL NULL is not the same as 0 (NULL in SQL is really a "does not exist" value, so that the average of 3 entries '5', '7', and 'NULL' will give an average of 6, not 4)" this is exactly what I need, but across a row and not within a single column. – Ginger Aug 18 '23 at 14:29
  • @ginger as I said, SQL stuff works on the data in columns. You can convert a row to column using the [Pivot](https://stackoverflow.com/questions/13372276/simple-way-to-transpose-columns-and-rows-in-sql) (search it out, its complicated enough to warrant proper investigation) that you can apply as you fetch each row. – gbjbaanb Aug 18 '23 at 19:15
  • A simpler link is the dupe https://stackoverflow.com/a/7368597/73226 – Martin Smith Aug 18 '23 at 19:22
0

You can use a combination of sum, coalesce and conditional aggregation.

This has been UPDATED to account for all null values. Basically, wrap the denominator with a NULLIF.

create table my_data (
  identifier integer, 
  date date, 
  value1 double precision, 
  value2 double precision, 
  value3 double precision, 
  value4 double precision, 
  value5 double precision, 
  value6 double precision
);
  
insert into my_data values 
(123, '2023-08-17', 4.9176, 5.0943, 5.0925, 3.6750, 3.7625, 3.3895), 
(987, '2023-08-17', 5.2808, null, 5.3000, 5.8750, null, 6.4117), 
(222, '2023-08-17', null, null, null, null, null, null); 
select identifier, date, 
    sum(
    coalesce(value1, 0) + 
    coalesce(value2, 0) + 
    coalesce(value3, 0) + 
    coalesce(value4, 0) + 
    coalesce(value5, 0) + 
    coalesce(value6, 0)
    ) / 
    nullif(sum(
    case when value1 is null then 0 else 1 end +
    case when value2 is null then 0 else 1 end + 
    case when value3 is null then 0 else 1 end + 
    case when value4 is null then 0 else 1 end + 
    case when value5 is null then 0 else 1 end + 
    case when value6 is null then 0 else 1 end
    ), 0) as average
from my_data
group by identifier, date
order by identifier
identifier date average
123 2023-08-17 4.3219
222 2023-08-17 null
987 2023-08-17 5.716875

fiddle

Isolated
  • 5,169
  • 1
  • 6
  • 18
  • I really like this and it's cleaner than what I ultimately came up with, however, when all values are null, SQL didn't like dividing by zero and threw an error. in the event all values are NULL, I would want the result to be Null (as opposed to leaving out the identifier altogether). – Ginger Aug 18 '23 at 15:18
  • I really appreciate all the help!! – Ginger Aug 18 '23 at 15:22
  • Just updated to account for all null values. If this resolves your question, then please consider marking the Accept flag. – Isolated Aug 18 '23 at 15:29
0

I eventually found a method that worked using some variations of suggestions here. Essentially, I added up the values in a row using a CASE WHEN VALUE_1 IS NOT NULL THEN VALUE_1 ELSE 0 END + CASE WHEN VALUE_2 IS NOT NULL THEN VALUE_2 ELSE 0 END +" and so on...then added up the values in a row where not null by 'CASE WHEN VALUE_1 IS NOT NULL THEN 1 ELSE 0 END + CASE WHEN VALUE_2 IS NOT NULL THEN 1 ELSE 0 +" and so on... used that to create a table that provided the sum of the values across the row as well as the sum of the NOT NULL cells across the row, from which I will add an additional statement like SUM(TOTAL_VALUES/TOTAL_COUNT) which will give me my average. A little bit of a windy path to get there, but it works.

Thanks all!

Ginger
  • 3
  • 3
0

AVG() is the function that you are talking about, and it exists in SQL standard, but (as well as SUM()) works vertically, not horizontally. It represents the so called aggregate functions. Which means you need to give it a column name and it will return the average value of that column for all the rows returned by your SQL statement or for the group identified by GROUP BY parameters of your SQL statement. See official docs.

You seem to calculate the average value of several values in each row, so the problem is not with the SUM function, but with the addition operator: when you add several values and one of them is NULL, the whole expression becomes NULL. This is normal, in SQL NULL is an unknown value and if you add unknown to a known value, the result becomes unknown.

To overcome this, you can either

  • organize values vertically, as naturally designed in SQL, or
  • rewrite the statement to add and count only known values:
(
  ISNULL(VALUE_1, 0) +
  ... +
  ISNULL(VALUE_6, 0)
)
/
(
  IIF(VALUE_1 IS NULL, 0, 1) +
  ...
  IIF(VALUE_6 IS NULL, 0, 1)
)

And to avoid the division by zero error (in case all 6 values are NULL), we need to wrap the whole expression in additional IIF:

IIF(
  VALUE_1 IS NULL AND
  ...
  VALUE_6 IS NULL,
  NULL,
  (the_above_expression)
)
  
Stas Trefilov
  • 173
  • 1
  • 6