0

Update: the first one is solved. The issue was caused by df2 because df2 is not a data.frame.

I have two tricky situations for my data (all data frames are huge).

1) df looks like:

ISO3     CurrencyCode        OriginalPrice 
USA            USD                   2.5
USA            n/a                   3.6
CAN            CAD                   2.8
...

and df2 is a currency code list looks like

ISO3      CurrencyCode
USA              USD
CAN              CAD
JAP              JPY
...

I tried to find those missing currency codes in df by using df2.

I tried the following code but not working:

setDT(df)[df2, CurrencyCode:= CurrencyCode, on = .(ISO3)]

Another issue is similar but more complex.

df looks like:

ID     PayDate        CurrencyCode   ISO3
1      2016/05/01      EUR            FIN
2      2019/01/14      CAD            CAN
...
10000  2015/07/31      USD            USA
10001  2018/12/07      CAD            CAN

df2 looks like:

StartDate     EndDate      CurrencyCode   Rate      ISO3
2015/01/01    2015/03/05     CAD          0.75      CAN
2017/05/08    2017/12/31     JPY          0.0091    JAP
....

2019/07/01    2019/08/31     JPY          0.0093    JAP

I want to make df looks like:

ID     PayDate        CurrencyCode   Rate    ISO3
1      2016/05/01      EUR           1.06    FIN
2      2019/01/14      CAD           0.85    CAN
...
10000  2015/07/31      USD           1       USA
10001  2018/12/07      CAD           0.75    CAN

and the rate is decided by the PayDate. The PayDate should locate between the StartDate and EndDate in df2. If there's no enough information from df2, then the rate should be defined by the StartDate closet to the PayDate.

This is my code:

setDT(df)[df2, Rate:= Rate, .(date =seq(StartDate, EndDate, by = "day")), by = .(ISO3)]

and again.... I got error.

  • Can you sshow a small reprodducible example with `dput`. There are some issues in the code. You. are doing a `join` without `on` and second, the assignment and expansion is happening on the same code, which would have a conflict because the assignment does create or update values in the original dataset, while expansion increases the number of rows and creates a totally new dataset – akrun Feb 13 '20 at 18:30
  • If you do join with `ISO3`, make sure that at least one of the dataset have no duplicate elements for ISO3 – akrun Feb 13 '20 at 18:33
  • Both df2 are unique. No duplicate rows. – mimibao1009 Feb 13 '20 at 18:36
  • the first df2, ISO3 is unique. However, the second one is now. Because the currency rates are varied by dates. In this case, which column should I use as an anchor? So confused. – mimibao1009 Feb 13 '20 at 18:38
  • the first code error says: `logical error. i is not a data.table, but 'on' argument is provided.` – mimibao1009 Feb 13 '20 at 18:55
  • @akrun, I fix the first issue. It's because df2 is not a data.table. I load df2 from an excel sheet directly. After using `df2 = data.frame(df2)`, problem solved!!! Now only the second one confused me. – mimibao1009 Feb 13 '20 at 18:57

1 Answers1

0

Here is an answer to the first 'problem'

library( data.table )

#or setDT(df);setDT(df2)

df <- fread("ISO3     CurrencyCode        OriginalPrice 
USA            USD                   2.5
USA            n/a                   3.6
CAN            CAD                   2.8")

df2 <- fread("ISO3      CurrencyCode
USA              USD
CAN              CAD
JAP              JPY")

#update join DF by reference with the CurrencyCode from df2
df[ df2, CurrencyCode := i.CurrencyCode, on = .(ISO3) ]
#    ISO3 CurrencyCode OriginalPrice
# 1:  USA          USD           2.5
# 2:  USA          USD           3.6
# 3:  CAN          CAD           2.8

For the second one, a non-equi join will probably do the trick. But without proper sample data, I cannot produce code. Please provide a better sample set, preferably one leading to your desired output

Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • thank you. As I updated two hours ago, problem 1 was solved and I used the code I provided in my questions. The error was cased by df2. It seems reading data from Excel directly won't create a data.frame. I make df2 = data.frame(df2) then problem was solved. As for the second problem, the sample data is exactly what I probided. I tried to use the solution from here: https://stackoverflow.com/questions/23342647/how-to-match-by-nearest-date-from-two-data-frames However, my data frames are very large and the solution is not work. – mimibao1009 Feb 13 '20 at 21:25
  • My current solution is to use the method I found above and process the code by each currency rate. There are about 85 countries.... I cannot figure out a better way to handle it. Thank you again! – mimibao1009 Feb 13 '20 at 23:31