nomatch argument in r data table causing bad behavior

Question

I have 2 data tables and am trying to get a column from cortable into finaltable.

cortable

cor,tickerkey
0.7539,AAL_AAN
0.573,AAL_ABB
0.6384,AAL_ACM
0.7193,AAL_ACXM
0.8386,AAL_ADP
0.7392,AAL_ADT
0.732,AAL_AER
0.4805,AAL_AGCO
0.9363,AAL_AL
0.9064,AAL_ALK
0.7545,AAL_ALSN
0.8586,AAL_AME
0.3356,AAL_AMT
0.8239,AAL_AN
0.8637,AAL_AOS
0.7638,AAL_APD
0.7915,AAL_APH
0.8785,AAL_APOL
0.8073,AAL_ARMH
0.7744,AAL_ASH
0.4179,AAL_ATLS
0.8282,AAL_AWI
-0.2539,AAL_AWK
0.8213,AAL_AXLL
0.827,AAL_BA
0.8642,AAL_BC
0.7982,AAL_BCO
0.2002,AAL_BEAV
0.7079,AAL_BERY
0.858,AAL_BGC
0.5943,AAL_BRK.B
0.1522,AAL_BWC
0.2793,AAL_CAR
0.8537,AAL_CAT
0.9115,AAL_CBI

finaltable

tickerkey,ticker1,ticker2
AAL_ALK,AAL,ALK
AAL_CAR,AAL,CAR
AAL_CHRW,AAL,CHRW
AAL_CNW,AAL,CNW
AAL_CSX,AAL,CSX
AAL_DAL,AAL,DAL
AAL_EXPD,AAL,EXPD
AAL_FDX,AAL,FDX
AAL_HTZ,AAL,HTZ
AAL_JBHT,AAL,JBHT

I am getting column into finaltable by

setkey(cortable, "tickerkey")    
setkey(finaltable, "tickerkey")

finaltable[cortable,cor:=cor,allow.cartesian=TRUE,nomatch=0]

The correct expected output would be finaltable

tickerkey,ticker1,ticker2,cor
AAL_ALK,AAL,ALK,0.9064
AAL_CAR,AAL,CAR,0.2793

with the rest of the rows having value of NA for cor

but it gives an output

finaltable

tickerkey,ticker1,ticker2,cor
AAL_ALK,AAL,ALK,0.2793
AAL_CAR,AAL,CAR,0.9064

with the rest of the rows NA for cor. and a warning on execution.. In [.data.table(finaltable, cortable, :=(cor, cor), allow.cartesian = TRUE, : Supplied 2 items to be assigned to 35 items of column 'cor' (recycled leaving remainder of 1 items).

If I remove nomatch argument, the mismatch doesn't happen.

I tried to look into the definition/behaviour of nomatch, didn't find much in the context of the above usage. If anyone could give some explanation, will be very helpful.

Can you show the expected output. When I run this, I get column2 as NA for column1 value as string3.1 — akrun, Jul 08 '15 at 17:16
I think I missed another feature of the data which is causing the problem. The number of rows in datatable2 is more than that of datatable1. So added couple more rows above in the data and showed the warning that comes up when the r script is run. — user2956863, Jul 08 '15 at 17:31
I get the expected output using the devel version of data.table — akrun, Jul 08 '15 at 17:38
I see the correct result being obtained for the simpler example shown above on my end also, the dataset I have is a much bigger one. Not sure what is causing the difference, more number of duplicates? more extra column values in one data table? a combination of the 2? not sure if there is way to upload the exact data set here.. — user2956863, Jul 08 '15 at 18:41
You have to show a reproducible example. I thought the behavior was for this particular example. — akrun, Jul 08 '15 at 19:20
A reference on making a reproducible example: http://stackoverflow.com/a/28481250/1191259 In the process of constructing an example, you'll have to whittle the problem down to a case that (i) illustrates the problem and (ii) is minimal in the sense of not giving us information that is obviously irrelevant to the problem. — Frank, Jul 08 '15 at 19:22
I apologize for just putting in a code prototype for the issue. I played around with the dataset, brought it down to the minimal amount of data which can reproduce that behaviour. Edited the data, behaviour shown above. Thanks again for being patient. — user2956863, Jul 08 '15 at 20:34

nomatch argument in r data table causing bad behavior

0 Answers0