I have a similar problem as described here, but none of the solutions from there which I have tried work.
Given a table like this:
Date Exercise Category Weight Reps EstMax RepxWeight Note
4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy
4/2/16 Deadlift Legs 135 7 166.4685 7x135 kinda easy
4/2/16 Deadlift Legs 135 7 166.4685 7x135 tired
4/2/16 Bench Press Chest 95 5 110.8175 5x95 hard
4/2/16 Bench Press Chest 135 2 143.991 2x135 not hard
4/9/16 Bench Press Chest 135 2 143.991 2x135 a little hard
4/9/16 Bench Press Chest 135 2 143.991 2x135 super tired
4/18/16 Deadlift Legs 155 8 196.292 8x155 …
4/18/16 Deadlift Legs 155 5 180.8075 5x155 bad day
5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day
5/8/16 Deadlift Legs 185 3 203.4815 3x185 felt easy
5/8/16 Bench Press Chest 115 4 130.318 4x115 easy
5/8/16 Bench Press Chest 115 4 130.318 4x115 hard
I want to aggregate
to get the rows that have the max
value for a certain column (e.g. EstMax
) based on multiple other columns (e.g. Date
and Exercise
), but also keep all the other columns in the row. And in the case of multiple entries with the same max value, take the first entry.
The expected output would look like this:
Date Exercise Category Weight Reps EstMax RepxWeight Note
4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy
4/2/16 Bench Press Chest 135 2 143.991 2x135 not hard
4/9/16 Bench Press Chest 135 2 143.991 2x135 a little hard
4/18/16 Deadlift Legs 155 8 196.292 8x155 …
5/8/16 Deadlift Legs 185 3 203.4815 3x185 good day
5/8/16 Bench Press Chest 115 4 130.318 4x115 hard
Examples of some method I've tried; in every case, the 'extra columns' end up being used as factors for the aggregation, which is not what I want.
data <- structure(list(Date = structure(c(2L, 2L, 2L, 2L, 2L, 3L, 3L,
1L, 1L, 4L, 4L, 4L, 4L), .Label = c("4/18/16", "4/2/16", "4/9/16",
"5/8/16"), class = "factor"), Exercise = structure(c(2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("Bench Press",
"Deadlift"), class = "factor"), Category = structure(c(2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("Chest",
"Legs"), class = "factor"), Weight = c(135L, 135L, 135L, 95L,
135L, 135L, 135L, 155L, 155L, 185L, 185L, 115L, 115L), Reps = c(7L,
7L, 7L, 5L, 2L, 2L, 2L, 8L, 5L, 3L, 3L, 4L, 4L), EstMax = c(166.4685,
166.4685, 166.4685, 110.8175, 143.991, 143.991, 143.991, 196.292,
180.8075, 203.4815, 203.4815, 130.318, 130.318), RepxWeight = structure(c(6L,
6L, 6L, 5L, 1L, 1L, 1L, 7L, 4L, 2L, 2L, 3L, 3L), .Label = c("2x135",
"3x185", "4x115", "5x155", "5x95", "7x135", "8x155"), class = "factor"),
Note = structure(c(4L, 8L, 11L, 7L, 9L, 2L, 10L, 1L, 3L,
6L, 5L, 4L, 7L), .Label = c("…", "a little hard", "bad day",
"easy", "felt easy", "good day", "hard", "kinda easy", "not hard",
"super tired", "tired"), class = "factor")), .Names = c("Date",
"Exercise", "Category", "Weight", "Reps", "EstMax", "RepxWeight",
"Note"), class = "data.frame", row.names = c(NA, -13L))
# base R
aggregate(EstMax ~ Date + Exercise, data = data, FUN = max)
# Date Exercise EstMax
# 1 4/2/16 Bench Press 143.9910
# 2 4/9/16 Bench Press 143.9910
# 3 5/8/16 Bench Press 130.3180
# 4 4/18/16 Deadlift 196.2920
# 5 4/2/16 Deadlift 166.4685
# 6 5/8/16 Deadlift 203.4815
aggregate(EstMax ~ Date + Exercise + RepxWeight + Note, data = data, FUN = max)
# Date Exercise RepxWeight Note EstMax
# 1 4/18/16 Deadlift 8x155 … 196.2920
# 2 4/9/16 Bench Press 2x135 a little hard 143.9910
# 3 4/18/16 Deadlift 5x155 bad day 180.8075
# 4 5/8/16 Bench Press 4x115 easy 130.3180
# 5 4/2/16 Deadlift 7x135 easy 166.4685
# 6 5/8/16 Deadlift 3x185 felt easy 203.4815
# 7 5/8/16 Deadlift 3x185 good day 203.4815
# 8 5/8/16 Bench Press 4x115 hard 130.3180
# 9 4/2/16 Bench Press 5x95 hard 110.8175
# 10 4/2/16 Deadlift 7x135 kinda easy 166.4685
# 11 4/2/16 Bench Press 2x135 not hard 143.9910
# 12 4/9/16 Bench Press 2x135 super tired 143.9910
# 13 4/2/16 Deadlift 7x135 tired 166.4685
# data table
library("data.table")
data_dt <- data.table(data)
data_dt[ , max(EstMax), by = c("Date", "Exercise")]
# Date Exercise V1
# 1: 4/2/16 Deadlift 166.4685
# 2: 4/2/16 Bench Press 143.9910
# 3: 4/9/16 Bench Press 143.9910
# 4: 4/18/16 Deadlift 196.2920
# 5: 5/8/16 Deadlift 203.4815
# 6: 5/8/16 Bench Press 130.3180
data_dt[, max(EstMax), .(Date, Exercise, Weight, Reps, RepxWeight, Note)]
# Date Exercise Weight Reps RepxWeight Note V1
# 1: 4/2/16 Deadlift 135 7 7x135 easy 166.4685
# 2: 4/2/16 Deadlift 135 7 7x135 kinda easy 166.4685
# 3: 4/2/16 Deadlift 135 7 7x135 tired 166.4685
# 4: 4/2/16 Bench Press 95 5 5x95 hard 110.8175
# 5: 4/2/16 Bench Press 135 2 2x135 not hard 143.9910
# 6: 4/9/16 Bench Press 135 2 2x135 a little hard 143.9910
# 7: 4/9/16 Bench Press 135 2 2x135 super tired 143.9910
# 8: 4/18/16 Deadlift 155 8 8x155 … 196.2920
# 9: 4/18/16 Deadlift 155 5 5x155 bad day 180.8075
# 10: 5/8/16 Deadlift 185 3 3x185 good day 203.4815
# 11: 5/8/16 Deadlift 185 3 3x185 felt easy 203.4815
# 12: 5/8/16 Bench Press 115 4 4x115 easy 130.3180
# 13: 5/8/16 Bench Press 115 4 4x115 hard 130.3180
Especially prefer base R solutions. Also saw the which.max()
function which might be helpful but couldn't figure out how to apply it to this.
Other related questions which I looked at but did not solve this:
Adding a non-aggregated column to an aggregated data set based on the aggregation of another column
Only keep min value for each factor level
How to select the row with the maximum value in each group
aggregating multiple columns in data.table
How to aggregate some columns while keeping other columns in R?