0

I have a dataframe with a variable/column called testIds which has many duplicates. However, this variable has many null values too. I want to get rid of the duplicate testIds without getting rid of all the null testIds.

I tried running this code:

df4<-df3[!duplicated(df3$testId),]

and it got rid of the duplicate test ids as requested, but it also did too much and got rid of all the null test ids instead leaving my dataframe to be quite sparse.

My goal is to leave the null test IDs untouched, but get rid of all the duplicates. I created a logical function below:

is.not_NA <- function(x) !is.na(x)

And then I created a function to do the same thing on a vector, so I can use it on the dataframe's column vector later:

is.not_NAs <- function(vector){

  vapply(vector,is.not_NA,TRUE)

  }

Below is the code I used to iterate over each of the values in my dataframe's testID, and I tried writing an if statement to isolate the values that are not null so I may remove duplicates from that subset:

for (j in is.not_NAs(df3$testId)){

  if (j==TRUE){

    #print(j)

    df4<-df3[!duplicated(df3$testId),]

  }

 

}

df4 

The issue is that this doesn't really do anything differently than what I had before and it still kicks out too many test IDs (all the null values). I understand why but I'm not sure what to put inside the "if" statement to achieve my goal.

How can I remove the duplicates of only the non-null test IDs in a dataframe?

  • 1
    It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jun 29 '22 at 14:54
  • 1
    Does this answer your question? [Remove duplicates while keeping NA in R](https://stackoverflow.com/questions/48448741/remove-duplicates-while-keeping-na-in-r) – megmac Jun 29 '22 at 18:51

0 Answers0