5

Suppose my DataFrame has two columns v and g. First, I grouped the DataFrame by column g and calculated the sum of the column v. Second, I used the function maximum to retrieve the maximum sum. I am wondering whether it is possible to retrieve the value in one step? Thanks.

julia> using Random

julia> Random.seed!(1)
TaskLocalRNG()

julia> dt = DataFrame(v = rand(15), g = rand(1:3, 15))
15×2 DataFrame
 Row │ v          g     
     │ Float64    Int64 
─────┼──────────────────
   1 │ 0.0491718      3
   2 │ 0.119079       2
   3 │ 0.393271       2
   4 │ 0.0240943      3
   5 │ 0.691857       2
   6 │ 0.767518       2
   7 │ 0.087253       1
   8 │ 0.855718       1
   9 │ 0.802561       3
  10 │ 0.661425       1
  11 │ 0.347513       2
  12 │ 0.778149       3
  13 │ 0.196832       1
  14 │ 0.438058       2
  15 │ 0.0113425      1

julia> gdt = combine(groupby(dt, :g), :v => sum => :v)
3×2 DataFrame
 Row │ g      v       
     │ Int64  Float64 
─────┼────────────────
   1 │     1  1.81257
   2 │     2  2.7573
   3 │     3  1.65398

julia> maximum(gdt.v)
2.7572966050340257


Likan Zhan
  • 1,056
  • 6
  • 14
  • 3
    Just so i understand your question correctly, youre looking for the maximum of the sums over all the 3 'categories' indicated by the column `:g` ? Cause if so, that doesn't seem like a very routine operation, and idk if theres any easier way to do this. – Jakob Sachs Dec 29 '21 at 08:54
  • Yes, That is what I want. – Likan Zhan Dec 29 '21 at 10:46

2 Answers2

2

I am not sure if that is what you mean but you can retrieve the values of g and v in one step using the following command:

julia> v, g = findmax(x-> (x.v, x.g), eachrow(gdt))[1]
(4.343050512360169, 3)
Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62
1

DataFramesMeta.jl has an @by macro:


julia> @by(dt, :g, :sv = sum(:v))
3×2 DataFrame
 Row │ g      sv      
     │ Int64  Float64 
─────┼────────────────
   1 │     1  1.81257
   2 │     2  2.7573
   3 │     3  1.65398

which gives you somewhat neater syntax for the first part of this.

With that, you can do either:

julia> @by(dt, :g, :sv = sum(:v)).sv |> maximum
2.7572966050340257

or (IMO more readably):

julia> @chain dt begin
         @by(:g, :sv = sum(:v))
         maximum(_.sv)
       end
2.7572966050340257
Sundar R
  • 13,776
  • 6
  • 49
  • 76