I have a data frame consisting of apx. 20 Mio. lines! The table looks as follows:
Output_I
cusip_id price
uidiso 100.5
undnns 90.2
xsodeid 45.5
uidiso 99.5
xsodeid 45.1
undnns 90.0
Now I have a second data frame consisting of the cusip_id plus additional information:
ouput_II
cusip_id ISIN
uidiso xs987346325
undnns ch438763282
xsodeid xs937349494
I'd like to merge output_I with output_II based on cusip_id in order to get the following results:
output_III
cusip_id price ISIN
uidiso 100.5 xs987346325
undnns 90.2 ch438763282
xsodeid 45.5 xs937349494
uidiso 99.5 xs987346325
xsodeid 45.1 xs937349494
undnns 90.0 ch438763282
The challenge is the size of output_I which consists of apx. 20 mio. lines. I have tried following codes:
library(dplyr)
output_III= left_join(output_I, output_II, by="cusip_id")
library(data.table)
ldt = data.table::data.table(output_I)
rdt = data.table::data.table(output_II, key = c("cusip_id", "ISIN")
output_III= merge(ldt, rdt)
Is there probably a way to use dplyr and group_by?
Or is this not possible with such a big data frame?
I appreciate your feedback.