2

Using package ramsey/uuid I tried generating large amount of uuids v4.

<?php

require __DIR__ . '/vendor/autoload.php';
use Ramsey\Uuid\Uuid;

$initialMemoryUsage = memory_get_usage(true) / 1024 / 1024;
$test = [];

for ($i = 0; $i < 100000; $i++) {
    $test[] = Uuid::uuid4()->toString();
}

var_dump(sprintf('Memory used: %d MB', (memory_get_usage(true) / 1024 / 1024) - $initialMemoryUsage));

outputs: string(18) "Memory used: 10 MB"

<?php

$initialMemoryUsage = memory_get_usage(true) / 1024 / 1024;
$test = [];

for ($i = 0; $i < 100000; $i++) {
    $test[] = '97c2ca84-bcfe-4618-b8a3-4d404eead37a';
}

var_dump(sprintf('Memory used: %d MB', (memory_get_usage(true) / 1024 / 1024) - $initialMemoryUsage));

outputs string(17) "Memory used: 4 MB"

Just invoking uuid generation does not cause any memory increase

for ($i = 0; $i < 100000; $i++) {
    Uuid::uuid4()->toString();
}

How come that in both cases the result is array of string(36) with 100000 elements but amount of used memory differs? Any ideas?

php -v

PHP 7.3.2-3+ubuntu16.04.1+deb.sury.org+1 (cli) (built: Feb  8 2019 15:43:26) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.2, Copyright (c) 1998-2018 Zend Technologies
Zilu
  • 23
  • 4
  • 2
    Interesting - I wonder if there are memory optimisations available when the data is duplicated/uniform. If you were to increment the string in your static example so that each was unique, would it increase memory usage? – OK sure Oct 20 '20 at 13:26
  • @OKsure Indeed, if each is unique memory usage increases. – Zilu Oct 20 '20 at 13:37
  • @OKsure That's certainly worth checking. It's strange though that in the example with unique generated values it takes up 10MB to hold 100k of 36-byte values. I checked and it seems to scale linearly, e.g. for a million values it took ~100MB and for ten million it took over a gigabyte - over 700MB more than should be necessary judging by simple calculations. – Rafał G. Oct 20 '20 at 13:37
  • Why keeping them in memory when generators are available? – Markus Zeller Oct 20 '20 at 13:39
  • I'm imagining, cause I'm out of my wheelhouse, that the static example can be condensed to only store the indexes and a single value. As an expression, it's very small. As for linear scaling, if it's also holding the indexes then that's going to add some overhead too perhaps – OK sure Oct 20 '20 at 13:42
  • In the first example there're objects involved. An object with a string property plus a copy of such string obviously needs more memory than just a standalone string. Additionally, PHP uses garbage collection—you can't assure it'll kick in right after each loop iteration. – Álvaro González Oct 20 '20 at 14:23

1 Answers1

6

Strings in PHP are immutable, which means they can't be changed. This also implies that they can easily be shared. In the first case, you have an array with 100k elements, each referencing a different string. In the second case, you have an array with 100k elements, each referencing the same string.

For further reference, take a look at www.phpinternalsbook.com.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
  • PHP strings are **mutable** but they use a copy-on-write mechanism (as explained [here](https://stackoverflow.com/a/22982909/12763954)). – Olivier Oct 26 '20 at 19:51
  • Yeah, after some experimentation with a function that generated random characters it turns out it's simply how PHP handles memory. It was confusing because for the amount of data I tested the memory used roughly corresponded with the simplistic formula (number of items x 36 bytes). So, the next question was: why is array usage overhead so big in PHP? The answer is here: https://nikic.github.io/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html - and some help comes from here: https://www.php.net/manual/en/class.splfixedarray.php - still 70% overhead, but better than 200%. – Rafał G. Oct 27 '20 at 08:08