-2

I have 2 data tables and am trying to get a column cor from cortable into finaltable.

cortable

cor,tickerkey
0.7539,AAL_AAN
0.573,AAL_ABB
0.6384,AAL_ACM
0.7193,AAL_ACXM
0.8386,AAL_ADP
0.7392,AAL_ADT
0.732,AAL_AER
0.4805,AAL_AGCO
0.9363,AAL_AL
0.9064,AAL_ALK
0.7545,AAL_ALSN
0.8586,AAL_AME
0.3356,AAL_AMT
0.8239,AAL_AN
0.8637,AAL_AOS
0.7638,AAL_APD
0.7915,AAL_APH
0.8785,AAL_APOL
0.8073,AAL_ARMH
0.7744,AAL_ASH
0.4179,AAL_ATLS
0.8282,AAL_AWI
-0.2539,AAL_AWK
0.8213,AAL_AXLL
0.827,AAL_BA
0.8642,AAL_BC
0.7982,AAL_BCO
0.2002,AAL_BEAV
0.7079,AAL_BERY
0.858,AAL_BGC
0.5943,AAL_BRK.B
0.1522,AAL_BWC
0.2793,AAL_CAR
0.8537,AAL_CAT
0.9115,AAL_CBI

dput

cortable<-structure(list(cor = c("0.7539", "0.573", "0.6384", "0.7193", 
                                  "0.8386", "0.7392", "0.732", "0.4805", "0.9363", "0.9064", "0.7545", 
                                  "0.8586", "0.3356", "0.8239", "0.8637", "0.7638", "0.7915", "0.8785", 
                                  "0.8073", "0.7744", "0.4179", "0.8282", "-0.2539", "0.8213", 
                                  "0.827", "0.8642", "0.7982", "0.2002", "0.7079", "0.858", "0.5943", 
                                  "0.1522", "0.2793", "0.8537", "0.9115"), 
                          tickerkey = c("AAL_AAN", "AAL_ABB", "AAL_ACM", "AAL_ACXM", "AAL_ADP", "AAL_ADT", "AAL_AER", 
                                                                                         "AAL_AGCO", "AAL_AL", "AAL_ALK", "AAL_ALSN", "AAL_AME", "AAL_AMT", 
                                                                                         "AAL_AN", "AAL_AOS", "AAL_APD", "AAL_APH", "AAL_APOL", "AAL_ARMH", 
                                                                                         "AAL_ASH", "AAL_ATLS", "AAL_AWI", "AAL_AWK", "AAL_AXLL", "AAL_BA", 
                                                                                         "AAL_BC", "AAL_BCO", "AAL_BEAV", "AAL_BERY", "AAL_BGC", "AAL_BRK.B", 
                                                                                         "AAL_BWC", "AAL_CAR", "AAL_CAT", "AAL_CBI")), .Names = c("cor", 
                                                                                                                                                  "tickerkey"), row.names = c(NA, -35L), class = c("data.table", 
                                                                                                                                                                                                   "data.frame"), sorted = "tickerkey")

finaltable

tickerkey,ticker1,ticker2
AAL_ALK,AAL,ALK
AAL_CAR,AAL,CAR
AAL_CHRW,AAL,CHRW
AAL_CNW,AAL,CNW
AAL_CSX,AAL,CSX
AAL_DAL,AAL,DAL
AAL_EXPD,AAL,EXPD
AAL_FDX,AAL,FDX
AAL_HTZ,AAL,HTZ
AAL_JBHT,AAL,JBHT

dput

    finaltable<-structure(list(tickerkey = c("AAL_ALK", "AAL_CAR", "AAL_CHRW", 
                                             "AAL_CNW", "AAL_CSX", "AAL_DAL", "AAL_EXPD", "AAL_FDX", "AAL_HTZ", 
                                             "AAL_JBHT"), ticker1 = c("AAL", "AAL", "AAL", "AAL", "AAL", "AAL", 
                                                                      "AAL", "AAL", "AAL", "AAL"), ticker2 = c("ALK", "CAR", "CHRW", 
                                                                                                               "CNW", "CSX", "DAL", "EXPD", "FDX", "HTZ", "JBHT")), .Names = c("tickerkey", 
                                                                                                                                                                               "ticker1", "ticker2"), row.names = c(NA, -10L), class = c("data.table", 
                                                                                                                                                                                                                                         "data.frame"), sorted = "tickerkey")

I am trying to achieve that with the code as below.

setkey(cortable, "tickerkey")
setkey(finaltable, "tickerkey")

finaltable[cortable,cor:=cor,allow.cartesian=TRUE,nomatch=0]

The correct expected output would be finaltable

tickerkey,ticker1,ticker2,cor
AAL_ALK,AAL,ALK,0.9064
AAL_CAR,AAL,CAR,0.2793

with the rest of the rows having value of NA for cor

but it gives an output

finaltable

tickerkey,ticker1,ticker2,cor
AAL_ALK,AAL,ALK,0.2793
AAL_CAR,AAL,CAR,0.9064

with the rest of the rows NA for cor. and a warning on execution.. In [.data.table(finaltable, cortable, :=(cor, cor), allow.cartesian = TRUE, : Supplied 2 items to be assigned to 35 items of column 'cor' (recycled leaving remainder of 1 items).

If I remove nomatch argument, the mismatch doesn't happen. I am trying to find out exactly whats causing this behavior, as I have a lot of code/data analysis that uses this and without knowing the exact cause of a probable unexpected behavior, brings down my confidence in all the data generated so far.

I tried to look into the definition/behaviour of nomatch, didn't find much in the context of the above usage. If anyone could give some explanation, will be very helpful.

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72
  • 1
    Downvoting because I don't like the expectation that we read in csv data. Here's a reference for how to provide example data http://stackoverflow.com/a/28481250/1191259 Or just look at other questions in the R tag... – Frank Jul 09 '15 at 16:24
  • @Frank unless I'm misinterpreting the question, you shouldn't need the data. It appears they just need proper merge syntax. – Dean MacGregor Jul 09 '15 at 16:28
  • @DeanMacGregor: It is always good to test with the sample data before posting an answer. – user227710 Jul 09 '15 at 16:29
  • @user227710 I mean that is certainly preferable to no data and especially for complex cases but, in this particular case, it appears the answer is simple enough as to make testing unnecessary. – Dean MacGregor Jul 09 '15 at 16:33
  • @DeanMacGregor: You are right , but if you run your code, you will see that your answer doesn't match with the expected output. – user227710 Jul 09 '15 at 16:36
  • @user227710 what isn't matching? the column order? I'm assuming their "expected output" has just taken a subset of the rows and that they do want all the rows. – Dean MacGregor Jul 09 '15 at 16:42
  • @DeanMacGregor: Please test your code and see whether it matches with the expected output. – user227710 Jul 09 '15 at 16:47
  • I am retrieving the column as above, not doing a merge, because the original data set is much bigger, and a merge would be more computationally intensive. The mismatch part, the cor value is being assigned to the wrong row. If you look at AAL_ALK, the cor value in the original table is 0.9064, but its giving .2793. I tested this with the data, brought down the data to as minimal as possible so I could post here. I will try to get more acquainted with posting guidelines. – user2956863 Jul 09 '15 at 16:48
  • @user2956863: You are supposed to show the expected output for your "sample data", not the original data and also please `dput` the sample data for reproducibility. – user227710 Jul 09 '15 at 16:50
  • Hi, I showed the expected output and the observed output, below my sample code. I am not trying to find another way to achieve this. I know other ways to. I have a lot of existing code that uses nomatch and that generated a lot of data. Unless I know whats causing this issue I have to regenerate all that data by substituting all the instances utilizing nomatch argument, which would take considerable time. Thanks for your help. – user2956863 Jul 09 '15 at 16:53
  • I do think that there was a genuine behaviour observed contrary to expectations, and I tested it out well before posting. I hope a bit of inexperience in posting shouldn't make people downright the question and increase the chance the question go unanswered. Will definitely try to be more formatted in my posts. – user2956863 Jul 09 '15 at 17:03
  • : ` tickerkey ticker1 ticker2 cor 1: AAL_CAR AAL CAR NA 2: AAL_CHRW AAL CHRW NA 3: AAL_CNW AAL CNW NA 4: AAL_CSX AAL CSX NA 5: AAL_DAL AAL DAL NA 6: AAL_EXPD AAL EXPD NA 7: AAL_FDX AAL FDX NA 8: AAL_HTZ AAL HTZ NA 9: AAL_JBHT AAL JBHT NA 10: AAL_ALK AAL ALK 0.9064`. – user227710 Jul 09 '15 at 17:05
  • That is the output I got from your code. I am using development version 1.9.5+ – user227710 Jul 09 '15 at 17:05
  • Hi user227710, Thanks for taking the time to test out my issue. If you look at AAL_CAR it has a value of NA in your output, but 0.2793 in the original cortable, which is a different type of mismatch observed. The version I used was 1.9.4. and the output I had switched the 2 values 0.9064 and 0.2793 between AAL_ALK and AAL_CAR. Is my usage incorrect, conceptually? – user2956863 Jul 09 '15 at 17:35

1 Answers1

1

This should work:

merge(cortable, finaltable, by=c('tickerkey'))[,list(tickerkey,ticker1,ticker2,cor)]

or you could do

cortable[finaltable][!is.na(cor)][,list(tickerkey,ticker1,ticker2,cor)]

the latter approach assumes you've set keys whereas the first doesn't matter if the keys are set.

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72
  • Hi Dean, I am trying to find the bug in the usage I listed above, and what triggers that perceived bug or behavior. Because I have existing data that was generated using that code, I am trying to decide if I should regenerate the data, coz if mismatches like the above happened the data I have is incorrect. Since it takes a significant time to regenerate, I am bothering people to find out what exactly causes this bug/behavior so I can minimize the regeneration to those specific cases which encounter data which triggers that perceived bug. – user2956863 Jul 09 '15 at 17:57
  • @user2956863 I think I see what your problem is. I don't think setting new columns and `nomatch` was intended to be used simultaneously which is why you're getting the error about 35 vs 2 fields. `nomatch` correctly limits the results to 2 entries but finaltable[cortable] wants there to be 35. If you take a row out of `cortable` and do this again then you won't get the error because 2 is divisible into 34 and it just repeats the assignment over and over which is why they appear to offset. – Dean MacGregor Jul 09 '15 at 20:00
  • Hi Dean, Thanks so much for your time. That sheds light a bit in the direction I was looking for. So, I guess It was incorrect usage. I tested for the divisibility thing though after you mentioned, once rows(cortable) > rows(finaltable) and after it hits the issue while I keep increasing the numrows of cortable, the issue stays. Not sure if I can make sense of observations from an incorrect usage...Thanks again. – user2956863 Jul 09 '15 at 20:52
  • @user2956863 I don't think you can safely use any observations this way. By the way, would you mind accepting the answer? – Dean MacGregor Jul 09 '15 at 23:19