3

I have a Julia data frame:

df=DataFrame("Category" => ["A", "B", "C"], "n" => [1,2,3])
3×2 DataFrame
 Row │ Category  n     
     │ String    Int64 
─────┼─────────────────
   1 │ A             1
   2 │ B             2
   3 │ C             3

and I would like to generate a data frame, where each row of df is repeated n times like this:

df2=DataFrame("Category" => ["A", "B","B","C","C","C"])
6×1 DataFrame
 Row │ Category 
     │ String   
─────┼──────────
   1 │ A
   2 │ B
   3 │ B
   4 │ C
   5 │ C
   6 │ C

I wrote a function that works fine, but I assume there is a more elegant way to do this. Here is my function:

function repeat_df_rows(df)
    @eval function Base.repeat(df_row::DataFrameRow{DataFrame, DataFrames.Index}; inner::Int64)
        rows = repeat(DataFrame(df_row), inner)
    end

    dfs = map(x -> repeat(x; inner = x.:n), eachrow(df))
    result = vcat(dfs..., cols=:union)
    result = result[:,Not(:n)]
end

Another problem with this function is that it always throws error at first attempt when run in script - I assume it is because expression after @eval macro is not executing immediately.

2 Answers2

3

using InMemoryDatasets.jl:

using InMemoryDatasets
df=Dataset("Category" => ["A", "B", "C"], "n" => [1,2,3])
repeat(df,freq=:n)
giantmoa
  • 327
  • 5
1

Using @eval is not recommended for regular data wrangling tasks. Here is an alternative method:

Define:

spread(vals,cnts) = 
  [v for (v,c) in zip(vals, cnts) for i in 1:c]

and now:

julia> combine(df, [:Category, :n] => spread => :Cateogry)
6×1 DataFrame
 Row │ Cateogry 
     │ String   
─────┼──────────
   1 │ A
   2 │ B
   3 │ B
   4 │ C
   5 │ C
   6 │ C

or (for all columns including n):

julia> combine(df, All() .=> (x -> spread(x, df.n)) .=> All())
6×2 DataFrame
 Row │ Category  n     
     │ String    Int64 
─────┼─────────────────
   1 │ A             1
   2 │ B             2
   3 │ B             2
   4 │ C             3
   5 │ C             3
   6 │ C             3
Dan Getz
  • 17,002
  • 2
  • 23
  • 41