0

I have a following data set as a data frame within R

article_number   1st_cutoff_date   2nd_cutoff_date

abc                12/01/2019       01/14/2020

def                02/10/2020       02/10/2020

What I want to do is in cases where 1st_cutoff_date == 2nd_cutoff_date, then replace 2nd_cutoff date with blank value " ". So in the second case 'def' then 2nd_cutoff_date would be blank " "

the data frame is of factors and there are NA's - I have converted to character and tried the following:

AAR_FTW_Final_w_LL[AAR_FTW_Final_w_LL$`1st_Booking_Deadline` == AAR_FTW_Final_w_LL$`2nd_Booking_Deadline`, c("2nd_Booking_Deadline")] <- " "

&

ind<- AAR_FTW_Final_w_LL$`1st_Booking_Deadline` == AAR_FTW_Final_w_LL[`2nd_Booking_Deadlilne`]
AAR_FTW_Final_w_LL[ind, c("2nd_Booking_Deadline")] <- " "

Both return the error:

Error in AAR_FTW_Final_w_LL$`1st_Booking_Deadline` : 
  $ operator is invalid for atomic vectors

I have tried replacing the $ with [] but then I get the error that one of the columns is missing. Is there any easier way to do to this task?

XCCH004
  • 321
  • 1
  • 11

1 Answers1

1

Convert from factors to characters :

df[] <- lapply(df, as.character)

Then use replace

transform(df, `2nd_cutoff_date` = replace(`2nd_cutoff_date`, 
                            `1st_cutoff_date` == `2nd_cutoff_date`, ''))

#  article_number X1st_cutoff_date X2nd_cutoff_date
#1            abc       12/01/2019       01/14/2020
#2            def       02/10/2020                 

It adds X to the column name since it is not a standard in R to have columns starting with a number.


Another approach after you convert the data to characters would be

df$`2nd_cutoff_date`[df$`1st_cutoff_date` == df$`2nd_cutoff_date`] <- ""

data

df <- structure(list(article_number = structure(1:2, .Label = c("abc", 
"def"), class = "factor"), `1st_cutoff_date` = structure(2:1, 
.Label = c("02/10/2020", "12/01/2019"), class = "factor"), 
`2nd_cutoff_date` = structure(1:2, .Label = c("01/14/2020", 
"02/10/2020"), class = "factor")), class = "data.frame", row.names = c(NA, -2L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Why the `\`` backticks? "1st_cutoff_date" should be a valid name. – thelatemail Dec 12 '19 at 04:15
  • I am relatively new to R it seems to add them automatically but I do keep getting an error when trying the above: Error in replace(`2nd_Booking_Deadline`, `1st_Booking_Deadline` == `2nd_Booking_Deadline`, : object '2nd_Booking_Deadline' not found – XCCH004 Dec 12 '19 at 04:18
  • its like it doesn't recognize - if I remove the back ticks then the numerical part starts being treated like a number in R - i.e. its highlighted differently – XCCH004 Dec 12 '19 at 04:18
  • @thelatemail It is not for me. Doesn't work when I remove the backticks. – Ronak Shah Dec 12 '19 at 04:20
  • @XCCH004 Are you sure you are using the correct column name? Do you have `\`2nd_Booking_Deadline\`` or `\`2nd_cutoff_date\`` ? – Ronak Shah Dec 12 '19 at 04:21
  • my apologies - the columns are 1st_Booking_Deadline and 2nd_Booking_Deadline but the results are the same either way - sorry for the confusion – XCCH004 Dec 12 '19 at 04:22
  • I was reading the wrong part of the code – XCCH004 Dec 12 '19 at 04:22
  • @RonakShah - ahhh.... i wasn't thinking - it can't start with a number. – thelatemail Dec 12 '19 at 04:27
  • @XCCH004 Are you using the correct dataframe name ? Step 1) `AAR_FTW_Final_w_LL[] <- lapply(AAR_FTW_Final_w_LL, as.character)` Step 2) `AAR_FTW_Final_w_LL$\`2nd_Booking_Deadline\`[AAR_FTW_Final_w_LL $\`1st_Booking_Deadline\` == AAR_FTW_Final_w_LL $\`2nd_Booking_Deadline\`] <- ""` What does it return to you ? – Ronak Shah Dec 12 '19 at 04:28
  • that worked - sorry for the confusion...thank you so much - one question what does the [] do in the first step? – XCCH004 Dec 12 '19 at 04:56
  • 1
    @XCCH004 `lapply` returns a list. Using `[]` maintains it's dimensions so `AAR_FTW_Final_w_LL` would still remain as dataframe. – Ronak Shah Dec 12 '19 at 05:03
  • excellent knowledge - thanks a ton – XCCH004 Dec 12 '19 at 05:08