2

How do find values that occur on the same date between two different columns in the same dataframe. Delete the values occurring in A01_CD columns and replace it with NA. And in cases where A01_CD has values, it should be moved to A01 and NA should be placed in A01. I have been trying to used duplicate () and Unique () and ifelse, but I am failing. Kindly assist.

My dataframe

    Date         A01        A01_CD
1   1966/05/07  4.870000    4.870
2   1966/05/08  4.918333    NA
3   1966/05/09  4.892000    4.860
4   1966/05/10  4.858917    NA
5   1966/05/11  4.842000    NA
211 1967/03/18  NA          5.95

Desired Outcome

    Date         A01        A01_CD
1   1966/05/07  4.870000    NA
2   1966/05/08  4.918333    NA
3   1966/05/09  4.892000    NA
4   1966/05/10  4.858917    NA
5   1966/05/11  4.842000    NA
211 1967/03/18  5.95        NA
Duck
  • 39,058
  • 13
  • 42
  • 84

2 Answers2

2

An option with coalesce which would return the first non-NA element across different columns given as argument for each row

library(dplyr)
df1 %>%
   transmute(Date, A01 = coalesce(A01, A01_CD), A01_CD = NA_real_)
#       Date      A01 A01_CD
#1 1966/05/07 4.870000     NA
#2 1966/05/08 4.918333     NA
#3 1966/05/09 4.892000     NA
#4 1966/05/10 4.858917     NA
#5 1966/05/11 4.842000     NA
#6 1967/03/18 5.950000     NA

Or in base R with row/column indexing

df1$A01 <- df1[-1][cbind(seq_len(nrow(df1)), max.col(!is.na(df1[-1]), 'first'))]
df1$A01
#[1] 4.870000 4.918333 4.892000 4.858917 4.842000 5.950000

data

df1 <- structure(list(Date = c("1966/05/07", "1966/05/08", "1966/05/09", 
"1966/05/10", "1966/05/11", "1967/03/18"), A01 = c(4.87, 4.918333, 
4.892, 4.858917, 4.842, NA), A01_CD = c(4.87, NA, 4.86, NA, NA, 
5.95)), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "211"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Nice Akrun. I guess adding a `A01_CD = NA` would allow a complete match to the expected outcome (though why the OP wants an NA column isn't clear) – Allan Cameron Sep 19 '20 at 21:26
1
DF <- DF %>% mutate(A01 = ifelse(is.na(A01), A01_CD, A01),
                    A01_CD =NA)

This has also helped. I needed the NA column was just to track if what I wanted to do was happening.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213