Remove duplicate rows keeping the ones that appear first and all columns of a dataframe

Question

An example df:

a = c("a", "b", "b", "b", "c")
b = c(1,4,3,2,5)
df = cbind.data.frame(a,b)

How do I remove duplicate rows searched only by column "a" & keeping the ones that appear first. I want to keep the other columns for corresponding rows. Desired output:

a1 = c("c","b","a")
b1 = c(5,4,1)
df1 = cbind.data.frame(a1,b1)

I want to use the code within a dplyr pipe. For example,

df2 = df %>% arrange(desc(b)) %>% filter(b >= 1)

Thank you

Thiloshon Nagarajah · Answer 1 · 2018-06-25T18:55:12.757

0

How about,

df[!duplicated(a), ]

#   a b
# 1 a 1
# 2 b 4
# 5 c 5

To use with pipe, create a function,

uniqueValues <- function(df, columnName){
    df[!duplicated(df[columnName]), ]
}

> df %>% arrange(desc(b)) %>% filter(b >= 1) %>% uniqueValues('a')
# a b
# 1 c 5
# 2 b 4
# 5 a 1

edited Jun 25 '18 at 18:55

answered Jun 25 '18 at 18:44

Thiloshon Nagarajah

131
7

Cannot use this with a dplyr pipe! – ip2018 Jun 25 '18 at 18:47
Added example to be used with pipe. – Thiloshon Nagarajah Jun 25 '18 at 18:56
Thanks. Another solution I found is to use: `group_by(a) %>% summarise(b=head(b,1))` – ip2018 Jun 25 '18 at 19:00

Remove duplicate rows keeping the ones that appear first and all columns of a dataframe

1 Answers1