2

l have a massive dataset that l divided into k mini datasets where k=100. Know l want to store these mini datasets in different files. to store my massive dataset l used the following instructions :

using JLD, HDF5
    X=rand(100000)
    file = jldopen("path to my file/mydata.jld", "w") # the extension of file is jld so you should add packages JLD and HDF5,  Pkg.add("JLD"), Pkg.add("HDF5"),
    write(file, "X", X)  # alternatively, say "@write file A"
    close(file)

Know l divided my dataset into k sub dataset where k=100

function get_mini_batch(X)

    mini_batches = round(Int, ceil(X / 100))

            for i=1:mini_batches
                mini_batch = X[((i-1)*100 + 1):min(i*100, end)]
                file= jldopen("/path to my file/mydata.jld", "w")
                write(file, "mini_batch", mini_batch)  # alternatively, say "@write file mini_batch"
                 lose(file)
            end
end

but this function allows to store the different sub dataset in one file which is overwritten at each iteration.

file= jldopen("/path to my file/mydata1.jld", "w")  # at each iteration l want to get files : mydata1, mydata2 ... mydata100
file= jldopen("/path to my file/mydata2.jld", "w")
file= jldopen("/path to my file/mydata3.jld", "w")
file= jldopen("/path to my file/mydata4.jld", "w")
.
.
.
file= jldopen("/path to my file/mydata100.jld", "w")

Alternatively l tried out this procedure function get_mini_batch(X)

    mini_batches = round(Int, ceil(X / 100))

            for i=1:mini_batches
                mini_batch[i] = X[((i-1)*100 + 1):min(i*100, end)]
                file[i]= jldopen("/path to my file/mydata.jld", "w")
                write(file, "mini_batch", mini_batch)  # alternatively, say "@write file mini_batch"
                 lose(file)
            end
end

but l don't have the idea of how to make a variable i=1....100 within this line code file[i]= jldopen("/path to my file/mydata(i).jld", "w")

Joey
  • 344,408
  • 85
  • 689
  • 683
vincet
  • 917
  • 3
  • 13
  • 26

2 Answers2

5

You are looking for string formatting.

To create the filenames, you can use @sprintf(). Then you can use these strings to write your objects to disk.

julia> using Printf  # Needed in Julia 1.0.0
julia> @sprintf("myfilename%02.d.jld", 5)
"myfilename05.jld"

Example in a loop:

julia> for i in 1:3
           println(@sprintf("myfilename%03.d.jl", i))
       end
myfilename001.jl
myfilename002.jl
myfilename003.jl

I used %03.d here to show how you can add leading zeros to your file names. This will help later on when it comes to sorting.

Julia Learner
  • 2,754
  • 15
  • 35
niczky12
  • 4,953
  • 1
  • 24
  • 34
  • but how can l can open a file , write inside and store it using @sprintf() ? like : file = jldopen("path to my file/mydata.jld", "w") # the extension of file is jld so you should add packages JLD and HDF5, Pkg.add("JLD"), Pkg.add("HDF5"), write(file, "X", X) # alternatively, say "@write file A" – vincet Jun 24 '16 at 12:14
  • You can use `@spritnf` to specify the filename. For example in your question, second code block replace: `file= jldopen("/path to my file/mydata.jld", "w")` with `file= jldopen(@sprintf("/path to file/mydata%d.jld, i), "w")` where `i` is the number of minibatch you are looping over. – niczky12 Jun 24 '16 at 12:16
  • My question is related to this topic "http://stackoverflow.com/questions/37989159/how-to-divide-my-data-into-distincts-mini-batches-randomly-julia". l want to create files to store X[P] values. How should l proceed to solve that ? – vincet Jun 27 '16 at 13:48
1

I agree with niczky12 that you are looking for string formatting. But I would personally write it this alternative way:

"/path to my file/mydata$i.jld"

instead of using sprintf.

Example:

julia> i = 4
4

julia> "/path/mydata$i.jld"
"/path/mydata4.jld"
Fengyang Wang
  • 11,901
  • 2
  • 38
  • 67
  • Yep, this is the easy way. I just prefer to have the leading zeros in this case, hence I used `@sprintf`. :) – niczky12 Jun 24 '16 at 13:36