1

I am trying to speed up my performance of existing code and converting dataframe syntax to data.table as I believe it provides functions faster than anyone. I want to do full outer join with data.tables but I do not know the keys (large columns and did not mention any key in existing code) then how I can use data.table for that?

df<-merge(df, x, all = TRUE)

DT[X,] 

table? Is there any other way to speed up the performance using other stuff?

M--
  • 25,431
  • 8
  • 61
  • 93
jkat
  • 49
  • 1
  • 4
  • 2
    It's hard to know how to specifically help you without seeing a small example of your data, or a way to replicate your data. Plus, what do you mean by a "faster solution"? What solution do you currently have, and how fast is it currently? – SymbolixAU Jul 10 '19 at 00:14
  • 1
    you can always set the keys yourself. dev data.table implemented natural joins, but you could also do that yourself using intersect(). the answer below is also correct -- under the hood, merge.data.table is using the "proper" [ construction to do the join types – MichaelChirico Jul 10 '19 at 00:33
  • Please read [this FAQ](https://stackoverflow.com/a/5963610/6574038) to learn how to make a minimal reproducible example. – jay.sf Jul 10 '19 at 08:56

1 Answers1

0

merge already uses data.table optimization, not much to do. In my experience data.table has blazing fast merge operations.

One approach could be to merge using as key integer variables or factor variables, that should be way faster than characters.

Daniel Fischer
  • 947
  • 6
  • 8