How can i delete all duplicate code and only left the first code for each of the code?

Question

This is my data.

code    long    lat
a   103.0059509 1.736281037
a   103.0055008 1.736822963
a   103.0049973 1.737220049
a   103.0044479 1.737781048
a   103.0041733 1.737781048
b   103.003891  1.738060951
b   103.0022202 1.738055944
b   103.0019455 1.738332033
b   103.0013885 1.738332033
b   103.0011139 1.738610029
c   103.0008316 1.738610029
c   103.0005569 1.738891006
c   103.000267  1.738891006
c   103         1.738610029

i want my code to show only the first code. No duplicate. And the value of long lat just remain its place

sorry for a duplicate question.i just couldnt think of any method to replace my blank space with NA — ahmad fikri, May 12 '16 at 01:27

akrun · Accepted Answer · 2016-05-31T06:44:33.053

2

Assuming that the 'code' column is character class, we replace the "" with NA and then use na.locf to replace the NA values with the previous non-NA value.

library(zoo)
df1$code[df1$code==""] <- NA
df1$code <- na.locf(df1$code)

If we want to get the original data from the output, use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'code', we get the row index (.I) for rows starting from 2 to the number of rows (.N). If we don't specify the column name, by default, it will assign 'V1' to output the row index column. Subset that column ($V1) and use that as 'i' to assign (:=) the values of 'code' to "".

library(data.table)
setDT(df1)[df1[, .I[2:.N] , code]$V1, code := ""][]
#  code     long      lat
# 1:    a 103.0060 1.736281
# 2:      103.0055 1.736823
# 3:      103.0050 1.737220
# 4:      103.0044 1.737781
# 5:      103.0042 1.737781
# 6:    b 103.0039 1.738061
# 7:      103.0022 1.738056
# 8:      103.0019 1.738332
# 9:      103.0014 1.738332
#10:      103.0011 1.738610
#11:    c 103.0008 1.738610
#12:      103.0006 1.738891
#13:      103.0003 1.738891
#14:      103.0000 1.738610

More info about the data.table can be found in the vignettes

edited May 31 '16 at 06:44

answered May 11 '16 at 09:12

akrun

874,273
37
540
662

what the way if my data is reverse from this? – ahmad fikri May 31 '16 at 05:37
@ahmadfikri If you want to do it from the reverse, use `na.locf(df1$code, fromLast=TRUE, na.rm=FALSE)` – akrun May 31 '16 at 05:39
meaning that i have all the code, and i want it to be as my above data – ahmad fikri May 31 '16 at 05:55
@ahmadfikri Sorry, I am not sure what you meant. Can you update your post with the expected – akrun May 31 '16 at 05:58
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/113373/discussion-between-ahmad-fikri-and-akrun). – ahmad fikri May 31 '16 at 06:24
1

thanks again @akrun – ahmad fikri May 31 '16 at 07:01

How can i delete all duplicate code and only left the first code for each of the code?

1 Answers1