0

I have a second question on data.tables. As far as I have understood, merges are called joins in data tables. How can I control which type of merge I have (one-to-one, many-to-one, one-to-many), and whether the variables in the 'using' dataset will replace the variables in the master dataset?

Also, if keys are necessary in order to perform the merge, and I have to do more than one merge on my data, do I have to keep changing the keys? This appears not very clean to me ....

Thanks you in advance, Matteo

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
matric
  • 27
  • 3
  • 1
    I voted to close as not a statistical question (see comment on your other question). But it is a reasonable question for somewhere else - Stack Overflow or R-help. Are you using the data.table package? You should say so explicitly. There's reasonable documentation of the data.table package eg at http://datatable.r-forge.r-project.org/datatable-faq.pdf - have you read this and are there things in it you want to understand? – Peter Ellis Mar 18 '13 at 06:21
  • 1
    You may also find http://stackoverflow.com/questions/2232699/r-how-to-do-a-data-table-merge-operation and http://stackoverflow.com/questions/9914734/translating-sql-joins-on-foreign-keys-to-r-data-table-syntax interesting – mnel Mar 25 '13 at 11:24
  • Please confirm if you have read the data.table FAQ, in particular FAQs 1.12 and 1.13. Note that since you tagged this question data.frame (only) originally, we didn't see it. Any question about R should be tagged R, and about data.table, tagged data.table as well please. It's a good question, but there is already quite a bit written on it. – Matt Dowle Mar 25 '13 at 14:09

1 Answers1

0

You could try to play with the merge() function. There you could define how you want to merge your data.frames.

x, y    
data frames, or objects to be coerced to one.

by, by.x, by.y  
specifications of the columns used for merging. See ‘Details’.

all 
logical; all = L is shorthand for all.x = L and all.y = L, where L is either TRUE or FALSE.

all.x   
logical; if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have NAs in those columns that are usually filled with values from y. The default is FALSE, so that only rows with data from both x and y are included in the output.

all.y   
logical; analogous to all.x.

Try ?merge for more information.

You can also have a look here QuickR Merge.

Jake Fisher
  • 3,220
  • 3
  • 26
  • 39
Sander Van der Zeeuw
  • 1,092
  • 1
  • 13
  • 35