I have been studying the Abstract Types and mainly the difference between signed integer types and unsigned integer types. I was really curious about the performance of Julia
so I measured the differences between those types. Since Julia is mostly implemented in C, I would assume almost the same behavior in abstract types. This question: performance of unsigned vs signed integers give some explanation about that unsigned integers leads to the same or better performance than signed. But in my benchmarking, I found that some Integers are faster than unsigned Integers. Here is some reproducible code:
julia> using BenchmarkTools
julia> Int_8 = rand(Int8, 1000)
julia> Int_16 = rand(Int16, 1000)
julia> Int_32 = rand(Int32, 1000)
julia> Int_64 = rand(Int64, 1000)
julia> Int_128 = rand(Int128, 1000)
julia> UInt_8 = rand(UInt8, 1000)
julia> UInt_16 = rand(UInt16, 1000)
julia> UInt_32 = rand(UInt32, 1000)
julia> UInt_64 = rand(UInt64, 1000)
julia> UInt_128 = rand(UInt128, 1000)
julia> @benchmark Int_8.^2
BenchmarkTools.Trial: 10000 samples with 147 evaluations.
Range (min … max): 698.333 ns … 54.836 μs ┊ GC (min … max): 0.00% … 98.34%
Time (median): 733.088 ns ┊ GC (median): 0.00%
Time (mean ± σ): 870.727 ns ± 2.209 μs ┊ GC (mean ± σ): 11.56% ± 4.49%
▇█▅▄▂▁ ▇▇▄▂▁ ▂▁ ▁ ▁▁ ▂
██████████████▇▆▆▆██████▇██████▇▆▅▆▆▄▄▅▆▆▆▄▄▅▆▆▆▆▆▅▅▄▅▄▄▄▅▃▄ █
698 ns Histogram: log(frequency) by time 1.28 μs <
Memory estimate: 1.14 KiB, allocs estimate: 5.
julia> @benchmark Int_16.^2
BenchmarkTools.Trial: 10000 samples with 16 evaluations.
Range (min … max): 986.625 ns … 3.895 ms ┊ GC (min … max): 0.00% … 99.93%
Time (median): 1.077 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.069 μs ± 77.347 μs ┊ GC (mean ± σ): 50.40% ± 2.00%
██▃ ▂▅▆▆▅▄▄▃▂▂▁ ▂
████▆▇▆▅▅▅▆▅▅▃▆▆▄████████████▇▅▅▅▅▄▄▆▇▆▆▇▇▇█▇▇▆▆▅▆▄▅▅▄▅▅▆▄▅▅ █
987 ns Histogram: log(frequency) by time 3.68 μs <
Memory estimate: 2.14 KiB, allocs estimate: 5.
julia> @benchmark Int_32.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.327 μs … 4.701 ms ┊ GC (min … max): 0.00% … 99.89%
Time (median): 1.474 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.481 μs ± 80.410 μs ┊ GC (mean ± σ): 39.99% ± 1.73%
▆█▆▃ ▁▄▄▄▄▃▃▂▁▁ ▁
█████▇▆▆▆▆▅▄▅▅▅▄▄▃▅███████████▇▇▅▅▅▆▆▆▄▅▅▆▅▆▄▆▅▅▇▆▅▅▅▆▅▅▆▆ █
1.33 μs Histogram: log(frequency) by time 5.82 μs <
Memory estimate: 4.14 KiB, allocs estimate: 5.
julia> @benchmark Int_64.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.749 μs … 2.949 ms ┊ GC (min … max): 0.00% … 99.73%
Time (median): 1.887 μs ┊ GC (median): 0.00%
Time (mean ± σ): 5.069 μs ± 84.920 μs ┊ GC (mean ± σ): 52.89% ± 3.15%
▇█▆▃▁ ▂▃▃▃▂▂▂▁▁ ▂
█████▇▇▇█▇▆▅▅▅▅▅▄▅▄▄▅▁▃▅▃▅▅███████████▇▇▆▅▃▅▄▁▄▅▄▅▆▅▇▆▅▅▆▆ █
1.75 μs Histogram: log(frequency) by time 7.31 μs <
Memory estimate: 8.02 KiB, allocs estimate: 5.
julia> @benchmark Int_128.^2
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
Range (min … max): 8.516 μs … 5.690 ms ┊ GC (min … max): 0.00% … 99.72%
Time (median): 8.778 μs ┊ GC (median): 0.00%
Time (mean ± σ): 10.962 μs ± 96.809 μs ┊ GC (mean ± σ): 15.28% ± 1.73%
▄█▅▃▁ ▁
██████▇▇▆▆▆▆▇▆▆▆▆▅▄▃▃▄▃▂▃▄▄▄▄▃▄▄▅▆▆▆▆▆▆▆▄▅▄▄▃▃▄▆▆▆▆▆▅▄▄▅▆▆▆ █
8.52 μs Histogram: log(frequency) by time 19.2 μs <
Memory estimate: 15.83 KiB, allocs estimate: 5.
julia> @benchmark UInt_8.^2
BenchmarkTools.Trial: 10000 samples with 104 evaluations.
Range (min … max): 845.058 ns … 76.695 μs ┊ GC (min … max): 0.00% … 98.21%
Time (median): 897.909 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.027 μs ± 2.643 μs ┊ GC (mean ± σ): 9.58% ± 3.68%
██▇▆▃▆█▇▅▄▃▂▁ ▁▁▁▁▁▂▁▁ ▁ ▃
███████████████▇▇▆▇█████████████▇▆▆▆▆▅▅▆▆▆▆▆▆▅▆▆▆▅▅▆▅▅▄▅▄▅▁▄ █
845 ns Histogram: log(frequency) by time 1.59 μs <
Memory estimate: 1.14 KiB, allocs estimate: 5.
julia> @benchmark UInt_16.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.032 μs … 6.110 ms ┊ GC (min … max): 0.00% … 99.96%
Time (median): 1.139 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.458 μs ± 83.275 μs ┊ GC (mean ± σ): 47.86% ± 1.41%
██▅▂▁ ▁▂▃▂▂▁▁ ▂
███████▆▆▆▆▆▄▇▆▆▇███████████▆▄▅▆▆▄▄▄▃▃▃▄▃▄▃▅▁▃▄▅▃▃▄▄▅▄▅▆▆▆ █
1.03 μs Histogram: log(frequency) by time 3.83 μs <
Memory estimate: 2.14 KiB, allocs estimate: 5.
julia> @benchmark UInt_32.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.205 μs … 4.694 ms ┊ GC (min … max): 0.00% … 99.91%
Time (median): 1.332 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.821 μs ± 101.562 μs ┊ GC (mean ± σ): 59.39% ± 2.23%
▇█▆▃ ▁▂▂▂▂▂▁ ▂
▇██████▆▇▇▆▆▆▇▇▅▅▄▄▁▃▃▄▄▃▄▇█████████▇▇▇▆▇▅▅▅▄▁▄▄▄▆▇▆▆▆▄▅▆▆▆ █
1.2 μs Histogram: log(frequency) by time 4.42 μs <
Memory estimate: 4.14 KiB, allocs estimate: 5.
julia> @benchmark UInt_64.^2
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.695 μs … 2.845 ms ┊ GC (min … max): 0.00% … 99.81%
Time (median): 1.842 μs ┊ GC (median): 0.00%
Time (mean ± σ): 4.734 μs ± 84.073 μs ┊ GC (mean ± σ): 56.11% ± 3.16%
▅██▆▃▁ ▁▁▁▁ ▂
███████▇▆▆▇▇▆▆▄▄▅▄▃▄▅▆▆▅▅▁▃▃▃▄▄▄▁▅▃▅▅▇█████████▇▇▇▆▆▄▆▆▄▄▃ █
1.7 μs Histogram: log(frequency) by time 5.71 μs <
Memory estimate: 8.02 KiB, allocs estimate: 5.
julia> @benchmark UInt_128.^2
BenchmarkTools.Trial: 10000 samples with 4 evaluations.
Range (min … max): 8.474 μs … 4.597 ms ┊ GC (min … max): 0.00% … 99.69%
Time (median): 8.691 μs ┊ GC (median): 0.00%
Time (mean ± σ): 10.980 μs ± 87.765 μs ┊ GC (mean ± σ): 15.96% ± 1.99%
▇█▆▄▃▁▁▁ ▁
█████████████▇█▇▇▇▆▆▆▅▅▄▆▄▄▄▄▅▄▄▂▄▅▄▄▄▅▆▇▇▇▆▇▇█▇▇▇▆▆▆▆▆▇▅▅▄ █
8.47 μs Histogram: log(frequency) by time 17 μs <
Memory estimate: 15.83 KiB, allocs estimate: 5.
julia> @btime $Int_8.^2
91.008 ns (1 allocation: 1.06 KiB)
julia> @btime $Int_16.^2
319.261 ns (1 allocation: 2.06 KiB)
julia> @btime $Int_32.^2
535.193 ns (1 allocation: 4.06 KiB)
julia> @btime $Int_64.^2
923.778 ns (1 allocation: 7.94 KiB)
julia> @btime $Int_128.^2
7.706 μs (1 allocation: 15.75 KiB)
julia> @btime $UInt_8.^2
89.870 ns (1 allocation: 1.06 KiB)
julia> @btime $UInt_16.^2
316.495 ns (1 allocation: 2.06 KiB)
julia> @btime $UInt_32.^2
539.642 ns (1 allocation: 4.06 KiB)
julia> @btime $UInt_64.^2
872.708 ns (1 allocation: 7.94 KiB)
julia> @btime $UInt_128.^2
7.061 μs (1 allocation: 15.75 KiB)
As you can see UInt8, UInt16 is slightly faster than Int8, Int16. But Int32 becomes faster than UInt32. Is there a reason for that? Int64 and Int128 become slower again. As you can see the performance changes at some point between unsigned integers and signed integers and crosses.
So I was wondering if anyone could explain why some signed Integers are faster than unsigned Integers and vice versa? Should there not be a more linear difference, because now it crosses performances within the types?
Edit: add benchmark like @Shayan mentioned in the comments
Here I added the benchmark. It seems that UInt8 and UInt32 of unsigned integers are faster and Int64 and Int128 Integers are faster, so still some difference between these Abstract Types.
julia> @benchmark $Int_8.^2
BenchmarkTools.Trial: 10000 samples with 958 evaluations.
Range (min … max): 81.209 ns … 9.620 μs ┊ GC (min … max): 0.00% … 97.35%
Time (median): 115.392 ns ┊ GC (median): 0.00%
Time (mean ± σ): 221.416 ns ± 855.670 ns ┊ GC (mean ± σ): 43.50% ± 10.98%
█▃ ▁
██▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▄▃▄▅▆ █
81.2 ns Histogram: log(frequency) by time 7.39 μs <
Memory estimate: 1.06 KiB, allocs estimate: 1.
julia> @benchmark $Int_16.^2
BenchmarkTools.Trial: 10000 samples with 241 evaluations.
Range (min … max): 312.037 ns … 259.117 μs ┊ GC (min … max): 0.00% … 99.79%
Time (median): 363.174 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.899 μs ± 18.716 μs ┊ GC (mean ± σ): 77.72% ± 7.83%
▄▆██▃ ▂▂▁ ▁ ▂
█████████████▇▆▆▆█▇▇▆▆▅▆▆▅▄▅▆▅▅▄▄▅▄▃▄▄▄▄▆▅▆▆▆▇▇██████▇▇▇▇▆▅▅▄ █
312 ns Histogram: log(frequency) by time 1.35 μs <
Memory estimate: 2.06 KiB, allocs estimate: 1.
julia> @benchmark $Int_32.^2
BenchmarkTools.Trial: 9034 samples with 194 evaluations.
Range (min … max): 520.572 ns … 251.764 μs ┊ GC (min … max): 0.00% … 99.71%
Time (median): 583.026 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.824 μs ± 22.005 μs ┊ GC (mean ± σ): 77.60% ± 9.84%
▇█▄▁▂▃▁ ▁▁ ▁
████████████▇▆▇▆▅▃▅▄▅▄▃▄▄▄▄▄▁▄▅▆▆▆▆▇▇▇▆▆▅▄▅▅▅▃▄▄▁▁▃▁▄▁▁▁▁▁▁▁▃ █
521 ns Histogram: log(frequency) by time 3.06 μs <
Memory estimate: 4.06 KiB, allocs estimate: 1.
julia> @benchmark $Int_64.^2
BenchmarkTools.Trial: 10000 samples with 44 evaluations.
Range (min … max): 867.795 ns … 653.472 μs ┊ GC (min … max): 0.00% … 99.82%
Time (median): 1.009 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.341 μs ± 36.073 μs ┊ GC (mean ± σ): 68.34% ± 6.30%
▃▆▇██▆▆▄▂▁ ▁ ▂
██████████▇▆▆▇▇▆▇▇▆▅▅▇▇█████▇▇▅▆▄▅▄▃▄▄▃▃▄▁▅▆▇▅▄▅▄▄▅▅▁▄▅▃▃▃▄▄▃ █
868 ns Histogram: log(frequency) by time 2.94 μs <
Memory estimate: 7.94 KiB, allocs estimate: 1.
julia> @benchmark $Int_128.^2
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
Range (min … max): 6.683 μs … 3.352 ms ┊ GC (min … max): 0.00% … 99.74%
Time (median): 7.721 μs ┊ GC (median): 0.00%
Time (mean ± σ): 10.562 μs ± 96.224 μs ┊ GC (mean ± σ): 27.31% ± 2.99%
▇▂▅▅ █▅ ▁
████▇▄▆███▇▅▆▆▅▅▆▃▄▂▄▅▄▄▄▄▄▄▄▅▅▃▄▃▄▄▂▄▂▃▂▄▃▃▆▆▆▆▄▄▆▆▇▇▆▆▆▅▇ █
6.68 μs Histogram: log(frequency) by time 14.9 μs <
Memory estimate: 15.75 KiB, allocs estimate: 1.
julia> @benchmark $UInt_8.^2
BenchmarkTools.Trial: 10000 samples with 967 evaluations.
Range (min … max): 80.428 ns … 10.368 μs ┊ GC (min … max): 0.00% … 98.18%
Time (median): 108.816 ns ┊ GC (median): 0.00%
Time (mean ± σ): 217.285 ns ± 835.138 ns ┊ GC (mean ± σ): 43.71% ± 11.08%
█▃ ▁
██▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▄▅▅▅▅ █
80.4 ns Histogram: log(frequency) by time 7.23 μs <
Memory estimate: 1.06 KiB, allocs estimate: 1.
julia> @benchmark $UInt_16.^2
BenchmarkTools.Trial: 8270 samples with 320 evaluations.
Range (min … max): 315.113 ns … 202.421 μs ┊ GC (min … max): 0.00% … 99.80%
Time (median): 361.734 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.899 μs ± 16.395 μs ┊ GC (mean ± σ): 79.09% ± 9.07%
▅▆█▆▂▃▁▂▂▁ ▁
██████████▇▆▆███▇▆▆▆▆▅▅▅▅▅▄▁▅▄▄▄▅▃▄▃▃▄▄▅▆▅▆▆▇▇▇▇▇▇▇█▆▇▆▆▄▆▅▄▆ █
315 ns Histogram: log(frequency) by time 1.35 μs <
Memory estimate: 2.06 KiB, allocs estimate: 1.
julia> @benchmark $UInt_32.^2
BenchmarkTools.Trial: 9148 samples with 194 evaluations.
Range (min … max): 525.278 ns … 236.308 μs ┊ GC (min … max): 0.00% … 99.74%
Time (median): 582.335 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.782 μs ± 21.821 μs ┊ GC (mean ± σ): 78.12% ± 9.84%
▇█▄▂▂▃▁ ▁ ▁
███████▇████▇▆▆▅▆▆▄▄▄▄▄▄▃▃▃▄▃▃▄▅▅▆▅▅▅▆▆▅▆▅▁▃▄▁▁▁▁▁▁▄▁▁▃▁▁▁▁▁▃ █
525 ns Histogram: log(frequency) by time 2.92 μs <
Memory estimate: 4.06 KiB, allocs estimate: 1.
julia> @benchmark $UInt_64.^2
BenchmarkTools.Trial: 10000 samples with 45 evaluations.
Range (min … max): 859.467 ns … 592.092 μs ┊ GC (min … max): 0.00% … 99.80%
Time (median): 988.933 ns ┊ GC (median): 0.00%
Time (mean ± σ): 3.448 μs ± 36.452 μs ┊ GC (mean ± σ): 70.21% ± 6.60%
▁▃▄█▃
▃▅█████▇▅█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▂▁▂ ▃
859 ns Histogram: frequency by time 2.3 μs <
Memory estimate: 7.94 KiB, allocs estimate: 1.
julia> @benchmark $UInt_128.^2
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
Range (min … max): 7.691 μs … 4.544 ms ┊ GC (min … max): 0.00% … 99.77%
Time (median): 7.794 μs ┊ GC (median): 0.00%
Time (mean ± σ): 11.989 μs ± 115.585 μs ┊ GC (mean ± σ): 27.18% ± 2.82%
█▄▃▃▂▂▁ ▁ ▁
█████████████▇▇▇▇▇▆▇▆▇▆▇██▇▇██▇▆▇▅▆▅▅▅▆▅▅▅▅▅▄▅▅▃▄▅▅▅▅▅▅▅▄▅▄▅ █
7.69 μs Histogram: log(frequency) by time 20.6 μs <
Memory estimate: 15.75 KiB, allocs estimate: 1.