0

New to R doing customer latency.

  • In the dataset i have around 300,000 rows with 15 columns. Some relevant columns are "Account", "Account Open Date", "Shipment pick up date" etc.

  • Account numbers are repeated and just want the rows with account numbers where it is recorded for the first time, not the subsequent rows.

    For eg. acc # 610829952 is in the first row as well as in the 5th row, 6th row etc. I need to filter out the first row alone and i need to do this for all the account numbers.

I am not sure how to do this. Could someone please help me with this?

enter image description here

Community
  • 1
  • 1
Vvk
  • 1

1 Answers1

0

There is a function in R called duplicated(). It allows you to check whether a certain value, like your account, has already been recorded.

First you check in the relevant column account which account numbers have already appeared before using duplicated(). You will get a TRUE / FALSE vector (TRUE indicating that the corresponding value has already appeared). With that information, you will index your data.frame in order to only retrieve the rows you are interested in. I will assume you have your data looks like df below:

df <- data.frame(segment = sample(LETTERS, 20, replace = TRUE),
                 account = sample(1:5, 20, replace = TRUE))
#      account segment
# 1        3       N
# 2        2       V
# 3        4       T
# 4        4       Y
# 5        4       M
# 6        4       E
# 7        5       H
# 8        3       A
# 9        3       J
# 10       3       Y
# 11       4       R
# 12       5       O
# 13       4       O
# 14       1       R
# 15       5       U
# 16       2       Q
# 17       5       F
# 18       2       J
# 19       4       E
# 20       2       H

inds <- duplicated(df$account)
# [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
# [11]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
df <- df[!inds, ]
#     account   segment
# 1        3       N
# 2        2       V
# 3        4       T
# 7        5       H
# 14       1       R
KenHBS
  • 6,756
  • 6
  • 37
  • 52