1

I have 2 dataframes in R, I want to use all the columns from the first dataframe and add a column from a second dataframe. However, which column from the second dataframe will be determine by a value in the first dataframe. I have tried merge, with, and others but just not sure what I am doing.

DataFrame 1

ODS_Data= data.frame(ContractNumber=c(1,2,1,2),
       TransactionEffectiveDateAdj=as.Date(c('2016-08-01','2016-08-01','2016-09-01','2016-09-01')),
       TransactionTypeCT=c('AF','FS','SW','FS'), GrossAmount=c(100,555,478,632))

DataFrame 2

OLAP_Data= data.frame(PolicyNumber=c(1,2,1,2),
    AsOfDate=as.Date(c('2016-08-01','2016-08-01','2016-09-01','2016-09-01')),
    AFGrossAmount=c(100,0,642,0), FSGrossAmount=c(0,555,0,632), SWGrossAmount=c(0,0,345,0))

The end result dataframe:

Comparison <- data.frame(TrxEffectiveDateAdj=as.Date(character()),
                 PolicyNumber=character(), 
                 TransactionType=character(),
                 ODSTotalGrossAmount=num(),
                 OLAPTotalGrossAmount=num(),
                 CompareResult=character())

How the Comparison dataframe should be populated:

Comparison$TrxEffectiveDateAdj = ODS_Data$TransactionDateAdj
Comparison$PolicyNumber = ODS_Data$ContractNumber
Comparison$TransactionType = ODS_Data$TransactionTypeCT
Comparison$ODSTotalGrossAmount = ODS_Data$GrossAmount
Comparison$OLAPTotalGrossAmount = If ODS_Data$TransactionTypeCT = 'AF' then OLAP_Data$AFGrossAmount
ElseIf ODS_Data$TransactionTypeCT = 'FS' Then OLAP_Data$FSGrossAmount
ElseIf ODS_Data$TransactionTypeCT = 'SW' then OLAP_Data$SWGrossAmount
End IF
Comarison$CompareResult = Comparison$ODSTotalGrossAmount - Comparison$OLAPTotalGrossAmount

Also, the data between the dataframes should be matched on ODS_Data$ContractNumber = OLAP_Data$PolicyNumber AND ODS_Data$TransactionEffectiveDateAdj = OLAP_Data$AsOfDate

That is a lot of information, but I am new to R and I would really appreciate the help. Thanks!

JRDew
  • 129
  • 1
  • 1
  • 13
  • This question is not the same as he suggested duplicate. The suggested duplicate just brings in the same column every time. My question brings in a different column based on the value of a column. Please let me know if I am missing something, thanks. – JRDew Sep 13 '16 at 13:43
  • Your "DataFrame 2" is in a "wide" format. The recommendation would be to reshape it to a long format and then do the merge. There are tons of questions already on this site about reshaping data. You may also want to look at the `reshape2` or `tidyr` packages. Or you can just do the merge first, and then create the new column based on the same rules you are using now but all values will be in the same table. – MrFlick Sep 13 '16 at 14:00
  • @MrFlick - Thanks, I am going with the merge first then create the new column approach. Not sure yet if that is the best way, or if I should reshape, but that is what I am going with for now. Thanks again. – JRDew Sep 13 '16 at 18:07

0 Answers0