5

I've stumbled upon something weird and I don't understand why it works that way.

I have an array of numbers, they are all unique:

$array = [
    98602142989816970,
    98602142989816971,
    98602142989816980,
    98602142989816981,
    98602142989816982,
    98602142989816983,
    98602142989820095,
    98602142989820096,
    98602142989822060,
    98602142989822061,
];
var_dump($array);
array(10) {
  [0]=>
  int(98602142989816970)
  [1]=>
  int(98602142989816971)
  [2]=>
  int(98602142989816980)
  [3]=>
  int(98602142989816981)
  [4]=>
  int(98602142989816982)
  [5]=>
  int(98602142989816983)
  [6]=>
  int(98602142989820095)
  [7]=>
  int(98602142989820096)
  [8]=>
  int(98602142989822060)
  [9]=>
  int(98602142989822061)
}

If I do print_r(array_unique($array)); everything is fine, I get:

Array
(
    [0] => 98602142989816970
    [1] => 98602142989816971
    [2] => 98602142989816980
    [3] => 98602142989816981
    [4] => 98602142989816982
    [5] => 98602142989816983
    [6] => 98602142989820095
    [7] => 98602142989820096
    [8] => 98602142989822060
    [9] => 98602142989822061
)

But If I add SORT_NUMERIC flag print_r(array_unique($array, SORT_NUMERIC)); I get:

Array
(
    [0] => 98602142989816970
    [6] => 98602142989820095
    [8] => 98602142989822060
)

Why only those 3 numbers are returned?

update: I'm on 64-bit system.

For sort functions I've manually shuffled some of the values because in original array they are already sorted.

If I do sort($array); then response is as expected:

Array
(
    [0] => 98602142989816970
    [1] => 98602142989816971
    [2] => 98602142989816980
    [3] => 98602142989816981
    [4] => 98602142989816982
    [5] => 98602142989816983
    [6] => 98602142989820095
    [7] => 98602142989820096
    [8] => 98602142989822060
    [9] => 98602142989822061
)

But with sort($array, SORT_NUMERIC);, they are sorted incorrectly:

Array
(
    [0] => 98602142989816970
    [1] => 98602142989816982
    [2] => 98602142989816983
    [3] => 98602142989816980
    [4] => 98602142989816981
    [5] => 98602142989816971
    [6] => 98602142989820095
    [7] => 98602142989820096
    [8] => 98602142989822060
    [9] => 98602142989822061
)
Arthur Shveida
  • 447
  • 2
  • 8
  • 2
    Probably because your values are outside of the integer range, so they get converted to float when you force that numeric context … and then the inherent imprecision of floats starts to bite. What kind of system are you on, 32bit or 64bit? – CBroe Feb 20 '20 at 11:00
  • Those are definitely not integers, find out with gettype(var). – Andrea Golin Feb 20 '20 at 11:02
  • 1
    On a 32bit system, your array printed without applying array_unique, gives the values as `9.8602142989817E+16`, `9.8602142989817E+16`, … already. If you are on a 64bit system, where these integers can be represented correctly to begin with, something probably goes wrong when SORT_NUMERIC comes into play - maybe that forces the use of 32bit / conversion to float internally again or something … – CBroe Feb 20 '20 at 11:04
  • That's definitely interesting, I think @CBroe's suggestion is right. Does it work if you use `sort()` instead of using the flag? – Mark Feb 20 '20 at 11:08
  • What if you convert it to strings and compare then? – Justinas Feb 20 '20 at 11:15
  • @MarkOverton I've updated the question with `sort()` examples – Arthur Shveida Feb 20 '20 at 11:39
  • @Justinas With strings it behaves the same way – Arthur Shveida Feb 20 '20 at 11:39

2 Answers2

5

You're running into an issue with precision and floating point arithmetic at that scale. There's a load more information available at Is floating point math broken? if you're interested, but I don't think this quite counts as a duplicate of that.

Taking your first two numbers:

php > var_dump((float) 98602142989816970 === (float) 98602142989816971);
bool(true)

php > var_dump((float) 98602142989816970, (float) 98602142989816971);
float(9.8602142989817E+16)
float(9.8602142989817E+16)

Internally, this is what's happening when PHP compares the values in your array using SORT_NUMERIC, deep down in numeric_compare_function.

sort suffers from the same issue, see https://3v4l.org/02UUB (Obviously no values are removed from the array since that only happens in array_unique - they just aren't sorted properly)

In short, with numbers this size (or specifically numbers that are very close together relative to their scale), SORT_NUMERIC isn't going to be reliable. Stick with comparing them as strings if you can.

iainn
  • 16,826
  • 9
  • 33
  • 40
0

It makes a difference whether the code runs under a 32-bit PHP or a 64-bit version, because the integer there is also 32-bit or 64-bit long.

$array = [
    98602142989816970,
    98602142989816971,
    98602142989816980,
    98602142989816981,
    98602142989816982,
    98602142989816983,
    98602142989820095,
    98602142989820096,
    98602142989822060,
    98602142989822061,
];
echo '<pre>';
var_dump(PHP_INT_MAX,$array);

The result for a 32-Bit-System:

int(2147483647)
array(10) {
  [0]=>
  float(9.8602142989817E+16)
  [1]=>
  float(9.8602142989817E+16)
  [2]=>
  float(9.8602142989817E+16)
  [3]=>
  float(9.8602142989817E+16)
  [4]=>
  float(9.8602142989817E+16)
  [5]=>
  float(9.8602142989817E+16)
  [6]=>
  float(9.860214298982E+16)
  [7]=>
  float(9.860214298982E+16)
  [8]=>
  float(9.8602142989822E+16)
  [9]=>
  float(9.8602142989822E+16)
}

PHP converts the values ​​to float right away because they are all greater than PHP_INT_MAX.

The result for a 64 Bit System:

int(9223372036854775807)
array(10) {
  [0]=>
  int(98602142989816970)
  [1]=>
  int(98602142989816971)
  [2]=>
  int(98602142989816980)
  [3]=>
  int(98602142989816981)
  [4]=>
  int(98602142989816982)
  [5]=>
  int(98602142989816983)
  [6]=>
  int(98602142989820095)
  [7]=>
  int(98602142989820096)
  [8]=>
  int(98602142989822060)
  [9]=>
  int(98602142989822061)
}

An array_unique among the 32-bit system reduces the array because some values ​​also exceed the accuracy of float.

If the SORT_NUMERIC option is not used, array_unique() and sort() work properly for the 64-bit version.

jspit
  • 7,276
  • 1
  • 9
  • 17