3

I'm curious why the following two expressions have a different result -- one NULL and the other TRUE:

postgres=# select array[1,null]=array[1,null];
 ?column? 
----------
 t
(1 row)

postgres=# select (1,null)=(1,null);
 ?column? 
----------
 
(1 row)

What's the reasons for this? And in your answer could you please link to the docs in Postgres or some SQL reference where it says the behavior of the row- and list-type when comparing with a NULL value inside?

David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    You always come up with edge questions. The second one returns uknown and that makes sense. The first one... I don't know how arrays work. – The Impaler Jul 30 '23 at 22:49

2 Answers2

8

Arrays containing null in the same position(s) compare equal if all else is equal. Only actual null values (including array values as a whole) return null when compared:

SELECT null::int[] = null::int[];  --> null

For row comparison, Postgres follows the SQL standard. The manual on Row and Array Comparisons:

The forms involving array subexpressions are PostgreSQL extensions; the rest are SQL-compliant.

The manual on Row Constructor Comparison:

The = and <> cases work slightly differently from the others. Two rows are considered equal if all their corresponding members are non-null and equal; the rows are unequal if any corresponding members are non-null and unequal; otherwise the result of the row comparison is unknown (null).

Bold emphasis mine. So row comparison is subtly different from array comparison. If nested null values in row values shall compare equal (like in arrays), use IS NOT DISTINCT FROM instead of =:

SELECT (1,null) IS NOT DISTINCT FROM (1,null);  --> true
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • thanks, wonderful answer. The only thing that I'd add is you cannot use `IS [NOT] DISTINCT FROM` within the `IN (values)` construction, correct? For example, `(1,null) IN ((1,null), (2,null))` -- or am I mistaken there / or is there an alternate way to do that? – David542 Jul 31 '23 at 02:01
  • 1
    @David542: That sounds like another, even trickier question - deserving its own question. Be sure to specify whether you want to test anonymous records or well-known row types. And whether you want to input a *list*, an *array* or *set* - or you don't care. And whether it must be the `IN` construct. In other words, what do you need *exactly*? – Erwin Brandstetter Jul 31 '23 at 03:25
  • sounds good. I'll ask a follow-up question. By the way, what is the difference between an (1) array, (2) set, and (3) list in Postgres? The only one I'm familiar with in terminology is the array -- e.g., `ARRAY[1,2]` or `'{1,2}'`. – David542 Jul 31 '23 at 18:39
  • 1
    @David542: About *list*, *array*, and *set* in connection with `IN`, see: https://stackoverflow.com/a/34627688/939860 – Erwin Brandstetter Jul 31 '23 at 22:34
  • Oh, got it. Thanks for the clarification, so List here is the expression list presented in `(val1, val2, val3, ...)`. Array is the normal `[1,2,3]` and Set would be a column returned in a subselect such as `1 IN (SELECT id FROM tbl)`. Is that a correct understanding? – David542 Aug 01 '23 at 00:02
  • @David542: Yes it is. – Erwin Brandstetter Aug 01 '23 at 00:16
1

source: https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/utils/adt/arrayfuncs.c#n3760
explain first query select array[1,null]=array[1,null];

3784-3787 compare two array base element data type.
3790-3793 two array dimension/bounds info check.
3796-3813 check weather array base types have equality function. some data type don't have equality function. 3825-3828 set up iteration for two array.
3838-3853 for each base element in an array, either Datum or null. so iterate and copy datum or null (one element either datum (not null) or null) to variable (it1, isnull1, it2, isnull2).

You answer in line 3845 and line 3848.

src/backend/utils/adt/arrayfuncs.c

3760: Datum
3761: array_eq(PG_FUNCTION_ARGS)
3762: {
3763:   LOCAL_FCINFO(locfcinfo, 2);
3764:   AnyArrayType *array1 = PG_GETARG_ANY_ARRAY_P(0);
3765:   AnyArrayType *array2 = PG_GETARG_ANY_ARRAY_P(1);
3766:   Oid         collation = PG_GET_COLLATION();
3767:   int         ndims1 = AARR_NDIM(array1);
3768:   int         ndims2 = AARR_NDIM(array2);
3769:   int        *dims1 = AARR_DIMS(array1);
3770:   int        *dims2 = AARR_DIMS(array2);
3771:   int        *lbs1 = AARR_LBOUND(array1);
3772:   int        *lbs2 = AARR_LBOUND(array2);
3773:   Oid         element_type = AARR_ELEMTYPE(array1);
3774:   bool        result = true;
3775:   int         nitems;
3776:   TypeCacheEntry *typentry;
3777:   int         typlen;
3778:   bool        typbyval;
3779:   char        typalign;
3780:   array_iter  it1;
3781:   array_iter  it2;
3782:   int         i;
3783: 
3784:   if (element_type != AARR_ELEMTYPE(array2))
3785:       ereport(ERROR,
3786:               (errcode(ERRCODE_DATATYPE_MISMATCH),
3787:                errmsg("cannot compare arrays of different element types")));
3788: 
3789:   /* fast path if the arrays do not have the same dimensionality */
3790:   if (ndims1 != ndims2 ||
3791:       memcmp(dims1, dims2, ndims1 * sizeof(int)) != 0 ||
3792:       memcmp(lbs1, lbs2, ndims1 * sizeof(int)) != 0)
3793:       result = false;
3794:   else
3795:   {
3796:       /*
3797:        * We arrange to look up the equality function only once per series of
3798:        * calls, assuming the element type doesn't change underneath us.  The
3799:        * typcache is used so that we have no memory leakage when being used
3800:        * as an index support function.
3801:        */
3802:       typentry = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
3803:       if (typentry == NULL ||
3804:           typentry->type_id != element_type)
3805:       {
3806:           typentry = lookup_type_cache(element_type,
3807:                                        TYPECACHE_EQ_OPR_FINFO);
3808:           if (!OidIsValid(typentry->eq_opr_finfo.fn_oid))
3809:               ereport(ERROR,
3810:                       (errcode(ERRCODE_UNDEFINED_FUNCTION),
3811:                        errmsg("could not identify an equality operator for type %s",
3812:                               format_type_be(element_type))));
3813:           fcinfo->flinfo->fn_extra = (void *) typentry;
3814:       }
3815:       typlen = typentry->typlen;
3816:       typbyval = typentry->typbyval;
3817:       typalign = typentry->typalign;
3818: 
3819:       /*
3820:        * apply the operator to each pair of array elements.
3821:        */
3822:       InitFunctionCallInfoData(*locfcinfo, &typentry->eq_opr_finfo, 2,
3823:                                collation, NULL, NULL);
3824: 
3825:       /* Loop over source data */
3826:       nitems = ArrayGetNItems(ndims1, dims1);
3827:       array_iter_setup(&it1, array1);
3828:       array_iter_setup(&it2, array2);
3829: 
3830:       for (i = 0; i < nitems; i++)
3831:       {
3832:           Datum       elt1;
3833:           Datum       elt2;
3834:           bool        isnull1;
3835:           bool        isnull2;
3836:           bool        oprresult;
3837: 
3838:           /* Get elements, checking for NULL */
3839:           elt1 = array_iter_next(&it1, &isnull1, i,
3840:                                  typlen, typbyval, typalign);
3841:           elt2 = array_iter_next(&it2, &isnull2, i,
3842:                                  typlen, typbyval, typalign);
3843: 
3844:           /*
3845:            * We consider two NULLs equal; NULL and not-NULL are unequal.
3846:            */
3847:           if (isnull1 && isnull2)
3848:               continue;
3849:           if (isnull1 || isnull2)
3850:           {
3851:               result = false;
3852:               break;
3853:           }
3854: 
3855:           /*
3856:            * Apply the operator to the element pair; treat NULL as false
3857:            */
3858:           locfcinfo->args[0].value = elt1;
3859:           locfcinfo->args[0].isnull = false;
3860:           locfcinfo->args[1].value = elt2;
3861:           locfcinfo->args[1].isnull = false;
3862:           locfcinfo->isnull = false;
3863:           oprresult = DatumGetBool(FunctionCallInvoke(locfcinfo));
3864:           if (locfcinfo->isnull || !oprresult)
3865:           {
3866:               result = false;
3867:               break;
3868:           }
3869:       }
3870:   }
3871: 
3872:   /* Avoid leaking memory when handed toasted input. */
3873:   AARR_FREE_IF_COPY(array1, 0);
3874:   AARR_FREE_IF_COPY(array2, 1);
3875: 
3876:   PG_RETURN_BOOL(result);
3877: }
3878: 
jian
  • 4,119
  • 1
  • 17
  • 32