How to remove duplicated rows from data frame in R

Question

How could I remove duplicated rows from data frame

  Area    Population
GOMBAK       668,694
GOMBAK       668,694
GOMBAK       668,694
  Batu       285,288
  Batu       285,288
 KLANG       842,146
 KLANG       842,146

to be

GOMBAK    668,694
  Batu    285,288 
 KLANG     842,14

I've tried this MK <- sqldf('SELECT DISTINCT * FROM Muki_Sela') — Adam, Aug 11 '15 at 05:25

Tim Biegeleisen · Accepted Answer · 2015-08-11T05:57:08.743

2

Try using the duplicated() function:

df <- data.frame(Area=c("GOMBAK", "GOMBAK", "GOMBAK", "Batu", "Batu", "KLANG", "KLANG"),
                 Population=c(668694, 668694, 668694, 285288, 285288, 842146, 842146))
df <- df[!duplicated(df), ]

> df
    Area Population
1 GOMBAK     668694
4   Batu     285288
6  KLANG     842146

If you want to compute the sum of the population, then the following should work:

sum(as.numeric(levels(df$Population)))

It is necessary to work with levels because your population column is a factor, based on what you mentioned in the comment.

edited Aug 11 '15 at 05:57

answered Aug 11 '15 at 05:06

Tim Biegeleisen

502,043
27
286
360

yes it dose thank you, any idea how to sum population column ?? i need the total sorry but i am new in R – Adam Aug 11 '15 at 05:29
Thank you bro I've got this error Error in Summary.factor(c(124L, 77L, 49L, 135L, 26L, 144L, 23L, 75L, 113L, : ‘sum’ not meaningful for factors – Adam Aug 11 '15 at 05:52
yes correct it is a factor Thank you very much – Adam Aug 11 '15 at 06:00
Why not just `unique( df )`? – vaettchen Aug 11 '15 at 07:37

score 0 · Answer 2 · edited Aug 11 '15 at 06:13

0

sqldf('SELECT DISTINCT * FROM df')

edited Aug 11 '15 at 06:13

Tim Biegeleisen

502,043
27
286
360

answered Aug 11 '15 at 06:05

Adam

49
6

score 0 · Answer 3 · answered Aug 11 '15 at 06:29

0

Using dplyr:

library(dplyr)
df %>% distinct

answered Aug 11 '15 at 06:29

mpalanco

12,960
2
59
67

How to remove duplicated rows from data frame in R

3 Answers3

Linked