2

How could I remove duplicated rows from data frame

  Area    Population
GOMBAK       668,694
GOMBAK       668,694
GOMBAK       668,694
  Batu       285,288
  Batu       285,288
 KLANG       842,146
 KLANG       842,146

to be

GOMBAK    668,694
  Batu    285,288 
 KLANG     842,14
Adam
  • 49
  • 6

3 Answers3

2

Try using the duplicated() function:

df <- data.frame(Area=c("GOMBAK", "GOMBAK", "GOMBAK", "Batu", "Batu", "KLANG", "KLANG"),
                 Population=c(668694, 668694, 668694, 285288, 285288, 842146, 842146))
df <- df[!duplicated(df), ]

> df
    Area Population
1 GOMBAK     668694
4   Batu     285288
6  KLANG     842146

If you want to compute the sum of the population, then the following should work:

sum(as.numeric(levels(df$Population)))

It is necessary to work with levels because your population column is a factor, based on what you mentioned in the comment.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0
sqldf('SELECT DISTINCT * FROM df')
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
Adam
  • 49
  • 6
0

Using dplyr:

library(dplyr)
df %>% distinct
mpalanco
  • 12,960
  • 2
  • 59
  • 67