0

An example df:

a = c("a", "b", "b", "b", "c")
b = c(1,4,3,2,5)
df = cbind.data.frame(a,b)

How do I remove duplicate rows searched only by column "a" & keeping the ones that appear first. I want to keep the other columns for corresponding rows. Desired output:

a1 = c("c","b","a")
b1 = c(5,4,1)
df1 = cbind.data.frame(a1,b1)

I want to use the code within a dplyr pipe. For example,

df2 = df %>% arrange(desc(b)) %>% filter(b >= 1)

Thank you

ip2018
  • 655
  • 1
  • 7
  • 14

1 Answers1

0

How about,

df[!duplicated(a), ]

#   a b
# 1 a 1
# 2 b 4
# 5 c 5

To use with pipe, create a function,

uniqueValues <- function(df, columnName){
    df[!duplicated(df[columnName]), ]
}

> df %>% arrange(desc(b)) %>% filter(b >= 1) %>% uniqueValues('a')
# a b
# 1 c 5
# 2 b 4
# 5 a 1