Actually, the system behaves exactly the inverse of what I was expecting:
13 ?- length(L,1000000),time(reverse(L,[X|_])),statistics.
% 1,000,002 inferences, 0.422 CPU in 0.424 seconds (100% CPU, 2367605 Lips)
% Started at Tue Mar 10 14:31:08 2015
% 523.563 seconds cpu time for 7,770,847 inferences
% 13,661 atoms, 4,540 functors, 3,987 predicates, 81 modules, 214,610 VM-codes
%
% Limit Allocated In use
% Local stack: 268,435,456 12,288 1,904 Bytes
% Global stack: 268,435,456 100,659,184 72,011,904 Bytes
% Trail stack: 268,435,456 129,016 2,280 Bytes
%
% 8 garbage collections gained 1,837,408 bytes in 1.346 seconds.
% Stack shifts: 13 local, 68 global, 47 trail in 0.034 seconds
% 2 threads, 0 finished threads used 0.000 seconds
L = [_G1238, _G1241, _G1244, _G1247, _G1250, _G1253, _G1256, _G1259, _G1262|...].
14 ?- length(L,1000000),time(append(_,[X],L)),statistics.
% 999,999 inferences, 0.572 CPU in 0.574 seconds (100% CPU, 1747727 Lips)
% Started at Tue Mar 10 14:31:08 2015
% 536.544 seconds cpu time for 8,772,339 inferences
% 13,662 atoms, 4,540 functors, 3,987 predicates, 81 modules, 214,615 VM-codes
%
% Limit Allocated In use
% Local stack: 268,435,456 12,288 2,960 Bytes
% Global stack: 268,435,456 50,327,536 48,011,920 Bytes
% Trail stack: 268,435,456 30,712 2,312 Bytes
%
% 8 garbage collections gained 1,837,408 bytes in 1.346 seconds.
% Stack shifts: 13 local, 72 global, 50 trail in 0.036 seconds
% 2 threads, 0 finished threads used 0.000 seconds
L = [_G1240, _G1243, _G1246, _G1249, _G1252, _G1255, _G1258, _G1261, _G1264|...]
.
Seems that reverse/2 is using 2 times the allocation of append/3. The global and trail stacks usage is double for reverse/2, a consequence of reverse/2 being cleverly compiled to reverse/4....