0

I have a big file (75GB) memory mapped in an array d that I want to copy in another m. Because I do not have 75GB of RAM available, I did:

for (i,v) in enumerate(d)
    m[i] = v
end

In order to copy the file value after value. But I get a copy rate of ~2MB/s on a SSD where I expect at least 50MB/s both in read and write.

How could I optimize this copy rate?

=== [edit] ===

According to the comments, I changed my code to the following, which sped up the write rate to 15MB/s

function copydcimg(m::Array{UInt16,4}, d::Dcimg)
    m .= d
    Mmap.sync!(m)
end

copydcimg(m,d)

At this point, I think I should optimize the Dcimg code. This binary file is made of frames spaced by a timestamp. Here is the code I use to access the frames:

module dcimg

using Mmap
using TOML

struct Dcimg <: AbstractArray{UInt16,4} # struct allowing to access dcimg file
    filename::String # filename of the dcimg
    header::Int # header size in bytes
    clock::Int # clock size in bytes
    x::Int
    y::Int
    z::Int
    t::Int
    m # linear memory map
    Dcimg(filename, header, clock, x, y, z, t) =
      new(filename, header, clock, x, y, z, t,
        Mmap.mmap(open(filename), Array{UInt16, 3},
            (x*y+clock÷sizeof(UInt16), z, t), header)
        )
end

# following functions allows to access DCIMG like an Array
Base.size(D::Dcimg) = (D.x, D.y, D.z, D.t)
# skip clock
Base.getindex(D::Dcimg, i::Int) =
    D.m[i + (i ÷ (D.x*D.y))*D.clock÷sizeof(UInt16)] 
Base.getindex(D::Dcimg, x::Int, y::Int, z::Int, t::Int) =
    D[x + D.x*((y-1) + D.y*((z-1) + D.z*(t-1)))]    

# allowing to automatically parse size
function Dcimg(pathtag)
    p = TOML.parsefile(pathtag * ".toml")
    return Dcimg(pathtag * ".dcimg",
        # ...
        )
end

export Dcimg, getframe

end
Hugo Trentesaux
  • 1,584
  • 1
  • 16
  • 30
  • 3
    Do you execute this snippet in global scope or inside a function? Wrap it in a function and pass `d` and `m` as parameters may help. – 张实唯 Mar 06 '19 at 12:16
  • If it is just copying, without other code in between: have you compared `copy!`? – phipsgabler Mar 06 '19 at 14:00
  • @张实唯 I'm currently running a script. Why do you think wrapping would change something? – Hugo Trentesaux Mar 06 '19 at 14:47
  • @phg, it's not copying, because I do not want to copy the object. It's physically reading a value in a file and writing it in another one. – Hugo Trentesaux Mar 06 '19 at 14:47
  • 2
    @HugoTrentesaux, `copy!` does read values from one of your Mmap array and write into the other assuming your Mmap arrays hold a bits-type, which AFAIK it must. It should change the file contents of `m` if you run `copy!(m, d); Mmap.sync!(m)`, or `m .= d; Mmap.sync!(m)`, the disk content of `m` should be updated. If you put your snippet into a function, it will be compiled and run faster. Your snippet without being wrapped into a function may be creating a bottleneck, although I would not expect that bottleneck to be as bad as 2MB/s. – hckr Mar 06 '19 at 18:33
  • 1
    @HugoTrentesaux Because It's the first tip in [Performance Tips](https://docs.julialang.org/en/v1/manual/performance-tips/index.html#Avoid-global-variables-1) – 张实唯 Mar 07 '19 at 04:59
  • Thanks all. I'm new to Julia and try to get at least what I get in Matlab ~ 50 MB/s. Thanks for the performance tips – Hugo Trentesaux Mar 07 '19 at 08:39

1 Answers1

1

I got it! The solution was to copy the file chunk by chunk lets say by frame (around 1024×720 UInt16). This way I reached 300MB/s, which I didn't even know was possible in single thread. Here is the code.

In module dcimg, I added the methods to access the file frame by frame

# get frame number n (starting form 1)
getframe(D::Dcimg,n::Int) = 
    reshape(D.m[
        D.x*D.y*(n-1)+1 + (n-1)*D.clock÷sizeof(UInt16) : # cosmetic line break
        D.x*D.y*n + (n-1)*D.clock÷sizeof(UInt16)
        ], D.x, D.y)
# get frame for layer z, time t (starting from 1)
getframe(D::Dcimg,z::Int,t::Int) = 
    getframe(D::Dcimg,(z-1)+D.z*(t-1))

Iterating over the frames within a loop

function copyframes(m::Array{UInt16,4}, d::Dcimg)
    N = d.z*d.t
    F = d.x*d.y
    for i in 1:N
        m[(i-1)*F+1:i*F] = getframe(d, i)
    end
end

copyframes(m,d)

Thanks all in comments for leading me to this.

===== edit =====

for further reading, you might look at:

which give hints about the optimal block size to copy at a time.

Hugo Trentesaux
  • 1,584
  • 1
  • 16
  • 30
  • I don't think these modifications were really necessary provided that you access mmap array with a linear scheme. I think the real problem was you do not give `Dcimg.m` a type, which makes your code type unstable, referring to `m` each time slower. Could you test the speed on a smaller file (I wouldn't want you to wear your SSD) after defining `m` as `m::Array{UInt16, 3}` in your `struct Dcimg`? I also could not understand your `getindex` scheme, specifically `Base.getindex(D::Dcimg, i::Int)`. Although this function is not used during copy operation, AFAIK. – hckr Mar 07 '19 at 10:41
  • You're right, adding the ::Array{UInt16, 3} allowed me to reach 150MB/s, which is fine. I think the additional increase in speed was due to reading block by block instead of switching quickly between read and write, but I can't be sure. – Hugo Trentesaux Mar 07 '19 at 11:16
  • I think you are also right! I did not consider that cost. But for type-stability, please remember to give a type for your `struct` fields. Otherwise, your functions will type-unstable and eventually slow. – hckr Mar 07 '19 at 11:48
  • Thanks for the advice. I though the type could be inferred from the constructor, because it's the only way the field can be set in this example, but I understand my mistake now. – Hugo Trentesaux Mar 07 '19 at 13:14