1

This is my data.

code    long    lat
a   103.0059509 1.736281037
a   103.0055008 1.736822963
a   103.0049973 1.737220049
a   103.0044479 1.737781048
a   103.0041733 1.737781048
b   103.003891  1.738060951
b   103.0022202 1.738055944
b   103.0019455 1.738332033
b   103.0013885 1.738332033
b   103.0011139 1.738610029
c   103.0008316 1.738610029
c   103.0005569 1.738891006
c   103.000267  1.738891006
c   103         1.738610029

i want my code to show only the first code. No duplicate. And the value of long lat just remain its place

ahmad fikri
  • 61
  • 1
  • 8

1 Answers1

2

Assuming that the 'code' column is character class, we replace the "" with NA and then use na.locf to replace the NA values with the previous non-NA value.

library(zoo)
df1$code[df1$code==""] <- NA
df1$code <- na.locf(df1$code)

If we want to get the original data from the output, use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'code', we get the row index (.I) for rows starting from 2 to the number of rows (.N). If we don't specify the column name, by default, it will assign 'V1' to output the row index column. Subset that column ($V1) and use that as 'i' to assign (:=) the values of 'code' to "".

library(data.table)
setDT(df1)[df1[, .I[2:.N] , code]$V1, code := ""][]
#  code     long      lat
# 1:    a 103.0060 1.736281
# 2:      103.0055 1.736823
# 3:      103.0050 1.737220
# 4:      103.0044 1.737781
# 5:      103.0042 1.737781
# 6:    b 103.0039 1.738061
# 7:      103.0022 1.738056
# 8:      103.0019 1.738332
# 9:      103.0014 1.738332
#10:      103.0011 1.738610
#11:    c 103.0008 1.738610
#12:      103.0006 1.738891
#13:      103.0003 1.738891
#14:      103.0000 1.738610

More info about the data.table can be found in the vignettes

akrun
  • 874,273
  • 37
  • 540
  • 662