5

I want to generate an empty array of dataframes that will be filled later in the code, but I have not figured out how to do it. Any help would be appreciated!

I have tried a standard way of defining an empty array.

julia> df = Array{DataFrame}(undef,10)
10-element Array{DataFrame,1}:
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef

julia> println(typeof(df[1]))
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex(::Array{DataFrame,1}, ::Int64) at ./array.jl:729
 [2] top-level scope at none:0

I'd expected typeof(df[1]) to say DataFrame, but it fails with an error message.

JPi
  • 127
  • 6

2 Answers2

9

Try:

df_vector = [DataFrame() for _ in 1:10]

or

map(_ -> DataFrame(), 1:10)
Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107
  • I was able to do `df_vector = DataFrame[]`, which is an empty array of DataFrame types, or `Vector{DataFrame}`. I added dataframes in a loop with `push!(df_vector, df1)` and so on. Now what I don't know how to do is access column 3 in each of the dataframes in my vector... – Merlin Dec 08 '22 at 08:28
  • 1
    e.g. `[df[:, 3] for df in df_vector]` or `getindex.(df_vector, :, 3)`. – Bogumił Kamiński Dec 08 '22 at 09:34
6

What you have is correct, for your understood definition of 'empty'. Once you have your first result, you can proceed to fill it with dataframes as normal. It is indeed a DataFrame array, since if you try to assign any other type to its elements you will get an error.

Note that "an empty array of dataframes" is not the same as "a (non-empty) array of empty dataframes".

If what you actually want is the latter, Bogumil's answer is the way to go.

Tasos Papastylianou
  • 21,371
  • 2
  • 28
  • 57
  • ah, you're right of course..... Though there's an asymmetry in that Array{Float64}(undef,10) would not have the same issues. – JPi Jul 15 '19 at 19:23
  • 1
    @JPi yes, but that's probably mostly a "C++" accident, and probably less desirable behaviour in any case. All other non-primitive objects will behave like dataframe. (try `Dict` for example, or even `Array`!) – Tasos Papastylianou Jul 15 '19 at 19:30
  • @JPi in fact, this historical accident may be the very reason there's an `isnothing` and an `ismissing` command, but no `isundef` one ... – Tasos Papastylianou Jul 15 '19 at 19:36
  • 3
    an "empty array of `DataFrame`s" is `DataFrame[]`, as this is what "empty" means by `isempty` function. But I think this is not what @JPi wanted. Now we have a distinction between "uninitialized array of `DataFrame`s" vs. "array of empty `DataFrames`" as @Tasos rightly pointed out. Now the key thing is that you probably want all `DataFrame`s to be distinct objects (not - the same object repeated multiple times) - and this what my solution does. You could repeat the same `DataFrame` 10 times using `fill(DataFrame(), 10)` but this is probably not what you want. – Bogumił Kamiński Jul 15 '19 at 20:43