Julia: A fast and elegant way to get a matrix from an array of arrays

Question

There is an array of arrays containing more than 10,000 pairs of Float64 values. Something like this:

v = [[rand(),rand()], ..., [rand(),rand()]]

I want to get a matrix with two columns from it. It is possible to bypass all pairs with a cycle, it looks cumbersome, but gives the result in a fraction of a second:

x = Vector{Float64}()
y = Vector{Float64}()
for i = 1:length(v)
    push!(x, v[i][1])
    push!(y, v[i][2])
end
w = hcat(x,y)

The solution with permutedims(reshape(hcat(v...), (length(v[1]), length(v)))), which I found in this task, looks more elegant but completely suspends Julia, is needed to restart the session. Perhaps it was optimal six years ago, but now it is not working in the case of large arrays. Is there a solution that is both compact and fast?

I don't understand why your loop example creates two vectors, `x` and `y`. Why not just create a matrix and then write the values straight into that? Seems much more direct? — DNF, May 17 '21 at 21:30

Bogumił Kamiński · Accepted Answer · 2021-05-17T20:36:05.287

12

I hope this is short and efficient enough for you:

 getindex.(v, [1 2])

and if you want something simpler to digest:

[v[i][j] for i in 1:length(v), j in 1:2]

Also the hcat solution could be written as:

permutedims(reshape(reduce(hcat, v), (length(v[1]), length(v))));

and it should not hang your Julia (please confirm - it works for me).

@Antonello: to understand why this works consider a simpler example:

julia> string.(["a", "b", "c"], [1 2])
3×2 Matrix{String}:
 "a1"  "a2"
 "b1"  "b2"
 "c1"  "c2"

I am broadcasting a column Vector ["a", "b", "c"] and a 1-row Matrix [1 2]. The point is that [1 2] is a Matrix. Thus it makes broadcasting to expand both rows (forced by the vector) and columns (forced by a Matrix). For such expansion to happen it is crucial that the [1 2] matrix has exactly one row. Is this clearer now?

edited May 17 '21 at 20:36

answered May 17 '21 at 18:11

Bogumił Kamiński

66,844
3
80
107

What the broadcasted getindex does? I would have expected a vector as output, as it is a broadcast operation.. – Antonello May 17 '21 at 19:34
1

I will explain in the answer. – Bogumił Kamiński May 17 '21 at 20:15
Is the `reshape` doing anything? `permutedims(reduce(hcat, v))` should work, as will (for arrays of real numbers) `reduce(vcat, v')`. – mcabbott May 18 '21 at 01:05
You are right in both cases - I just focused on resolving the splatting issue - without checking the code. – Bogumił Kamiński May 18 '21 at 14:04

score 3 · Answer 2 · answered May 18 '21 at 08:02

Your own example is pretty close to a good solution, but does some unnecessary work, by creating two distinct vectors, and repeatedly using push!. This solution is similar, but simpler. It is not as terse as the broadcasted getindex by @BogumilKaminski, but is faster:

function mat(v)
    M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
    for i in eachindex(v)
        M[i, 1] = v[i][1]
        M[i, 2] = v[i][2]
    end
    return M
end

You can simplify it a bit further, without losing performance, like this:

function mat_simpler(v)
    M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
    for (i, x) in pairs(v)
        M[i, 1], M[i, 2] = x
    end
    return M
end

score 1 · Answer 3 · answered May 19 '21 at 06:25

A benchmark of the various solutions posted so far...

using BenchmarkTools
# Creating the vector
v = [[i, i+0.1] for i in 0.1:0.2:2000]

M1 = @btime vcat([[e[1] e[2]] for e in $v]...)
M2 = @btime getindex.($v, [1 2])
M3 = @btime [v[i][j] for i in 1:length($v), j in 1:2]
M4 = @btime permutedims(reshape(reduce(hcat, $v), (length($v[1]), length($v))))
M5 = @btime permutedims(reshape(hcat($v...), (length($v[1]), length($v))))

function original(v)
    x = Vector{Float64}()
    y = Vector{Float64}()
    for i = 1:length(v)
        push!(x, v[i][1])
        push!(y, v[i][2])
    end
    return hcat(x,y)
end
function mat(v)
    M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
    for i in eachindex(v)
        M[i, 1] = v[i][1]
        M[i, 2] = v[i][2]
    end
    return M
end
function mat_simpler(v)
    M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
    for (i, x) in pairs(v)
        M[i, 1], M[i, 2] = x
    end
    return M
end

M6 = @btime original($v)
M7 = @btime mat($v) 
M8 = @btime mat($v)

M1 == M2 == M3 == M4 == M5 == M6 == M7 == M8 # true

Output:

1.126 ms (10010 allocations: 1.53 MiB)       # M1
  54.161 μs (3 allocations: 156.42 KiB)      # M2
  809.000 μs (38983 allocations: 765.50 KiB) # M3
  98.935 μs (4 allocations: 312.66 KiB)      # M4
  244.696 μs (10 allocations: 469.23 KiB)    # M5
219.907 μs (30 allocations: 669.61 KiB)      # M6
34.311 μs (2 allocations: 156.33 KiB)        # M7
34.395 μs (2 allocations: 156.33 KiB)        # M8

Note that the dollar sign in the benchmarked code is just to force @btime to consider the vector as a local variable.

Julia: A fast and elegant way to get a matrix from an array of arrays

3 Answers3