0

I currently have a data frame that looks like this:

        result 1    result 2    result 3    median 
item 1    8             7           6         7 
item 5    1             2           3         2 
item 1    6             5           4         5
item 5    3             4           5         4 

I want to remove the duplicates based on the median, where I want to keep the duplicate entry with the higher median. Problem with this is that the rownames (item 1, etc) are not their own columns, so it's not accessible with $ operations.

How can I accomplish this? Thanks in advance.

3 Answers3

5

You can simply order decreasing and remove the duplicates, i.e.

df <- df[order(df$median, decreasing = TRUE),]
df[!duplicated(df$row),]

which gives,

    row result1 result2 result3 median
1 item1       8       7       6      7
4 item5       3       4       5      4
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Sorry about the confusion but the "row" column is actually just the rownames (it's not its own column) - how do I tackle this? Thanks for your help. Seems like a real easy fix. – Alex Johanssen Feb 12 '18 at 08:41
  • nevermind, just added another column and took care of it. thanks for your help! – Alex Johanssen Feb 12 '18 at 08:55
1

We can group by 'row' and then filter the rows having the max value for 'median'

library(dplyr)
df1 %>%
   group_by(row) %>% 
   filter(median == max(median))
# A tibble: 2 x 5
# Groups: row [2]
#   row    result1 result2 result3 median
#   <chr>    <int>   <int>   <int>  <int>
#1 item 1       8       7       6      7
#2 item 5       3       4       5      4

If there are ties for max value of 'median' and we want the first row that matches, then use which.max with slice

df1 %>%
    group_by(row) %>%
    slice(which.max(median))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Here is a solution with data.table

library("data.table")
D <- fread(
"item   result1    result2    result3    median
item1    8             7           6         7
item5    1             2           3         2
item1    6             5           4         5
item5    3             4           5         4")
D[, maxmed:=max(median), by=item][median==maxmed]
jogo
  • 12,469
  • 11
  • 37
  • 42