2

I have a vector of data of 100000 examples. The values are -1 and 1. I want to get from this data 16 distinct mini-batches randomly, each one of 6250.

Here is my code to generate the vector of 100000 examples which is stored in a file.

The question of how to divide my data to different parts is answered by Dan.

Now, l want to store [X[p] for p in parts] in p files. l mean by that : if l have 3 parts , l want to create and store the values of p. How can l do that ?

workspace()
using JLD, HDF5
#import HTreeRBM

function gen_random(m,k)  

# m the length of the vector , for instance m=100000 and k the number of partitions let's set k=16

s = rand(m)
# Pkg.add("JLD"), Pkg.add("HDF5") these two packages are needed in order to store our vectors in files under the extension jld 

 # allow to convert each random number to -1 or 1

X=float_to_binary(s)



parts= kfoldperm(length(X),k)

for p in 1:length(parts)
file =jldopen(@sprintf("my path to file/mini_batch%d.jld", p),"w")
write(file, "X", [X[p] for p in parts]) 
close(file)
end
return [X[p] for p in parts]

            function float_to_binary(s,level=0.4)
      for i=1:length(s)
        s[i] = s[i] > level ? 1.0 : -1.0
      end
    file = jldopen("/home/anelmad/Desktop/stage-inria/code/HTreeRBM.jl/artificial_data/mydata.jld", "w")
    write(file, "s", s)  # alternatively, say "@write file A"
    close(file)
      return s
    end


           function kfoldperm(l,k)
    n,r = divrem(l,k)
    b = collect(1:n:l+1)
        for i in 1:length(b)
            b[i] += i > r ? r : i-1  
        end
    p = randperm(l)
       return [p[r] for r in [b[i]:b[i+1]-1 for i=1:k]]


    end
vincet
  • 917
  • 3
  • 13
  • 26

1 Answers1

2

Define kfoldperm by running:

function kfoldperm(N,k)
    n,r = divrem(N,k)
    b = collect(1:n:N+1)
    for i in 1:length(b)
        b[i] += i > r ? r : i-1  
    end
    p = randperm(N)
    return [p[r] for r in [b[i]:b[i+1]-1 for i=1:k]]
end

Now,

v = rand(10)
parts = kfoldperm(10,3)
[v[p] for p in parts]

Will give you a partition of v to 3 parts.

Dan Getz
  • 17,002
  • 2
  • 23
  • 41
  • kflodperm(10,3) is not recognized by julia compiler. Is that function suppoerted by julia ? what the parameter N, k, n r represent ? thank you for help – vincet Jun 23 '16 at 13:27
  • 2
    @samtzaurtis Note that Dan Getz defined the function `kfoldperm` above. It is not a built-in function. – Fengyang Wang Jun 23 '16 at 13:39
  • The parameter `N` is the number of elements in the dataset. `k` is the number of parts. The return value is a vector with k elements. Each elements is a vector of indices (between 1 and N) which you can use as an index to the dataset. – Dan Getz Jun 23 '16 at 14:34
  • when l run parts = kfoldperm(10,3) it returns this error ERROR: MethodError: no method matching colon(::Int64, ::Array{Float64,1}, ::Array{Float64,1}) Closest candidates are: colon{T<:Real}(::T<:Real, ::Any, ::T<:Real) colon{A<:Real,C<:Real}(::A<:Real, ::Any, ::C<:Real) colon{T}(::T, ::Any, ::T) the problem comes from the line b = collect(1:n:N+1) – vincet Jun 24 '16 at 09:00
  • l don't also understand what this line does for i in 1:length(b) b[i] += i > r ? r : i-1 end – vincet Jun 24 '16 at 09:15
  • here is the answer of how permuting randomly a vector using collections http://stackoverflow.com/questions/38010052/julia-how-to-permute-randomly-a-vector-in-julia – vincet Jun 24 '16 at 09:55
  • From the error message, it seems you are calling `kfoldperm` with floating point vectors as parameters (which 10 and 3 in the example are not - try the exact example first and it will clear things up. Perhaps in a clean julia environment). As for the `b[i] += `... line, it adjust by 1 the length of some parts to account for the cases where the length of vector does not divide evenly by number of folds. – Dan Getz Jun 24 '16 at 11:57
  • Looking at the code in the question, I can see you are calling the function with `Z=kfoldperm(Y,k)`. It should be `Z=kfoldperm(length(Y),k)`. The parameters are integers! And the next line should use `Z` instead of `parts`. `parts` was just the name of a variable with the return value of `kfoldperm`, which you changed to `Z`. And you are doing nothing with the partitioned vector. Try stuff in the REPL a bit to get a feeling for what the return value is. – Dan Getz Jun 24 '16 at 12:12
  • thank you a lot Dan Getz. l understand better and it's working. But now l want to store [v[p]] in different files, in p different files. so if l have for examples 3 parts l want to creat 3 files to store the values of each part. hope that it's more clear. – vincet Jun 27 '16 at 13:34
  • l want to stor X[1] in file 1 , X[2] in file 2, X[3] in file 3...... in my loop l store all the X[1], X[2], X[3] in the different files – vincet Jun 27 '16 at 14:47
  • It might be best to have the file question in another separate StackOverflow question. – Dan Getz Jun 27 '16 at 16:56
  • ok l do it. since it's related so l will only share the code related to the problem of storing values in files – vincet Jul 01 '16 at 14:14