0

I am trying to merge the five related data.frames in R by using merge() from the package data.table.

When I merge the data.frames, the data becomes huge with multiple repeat ids (the rows are still distinct).

But if I want to report the outcome for only the unique ids.

  • Should I delete the rows with repeated ids? But that will delete the data.
  • How to deal such thing?
Mathieu
  • 8,840
  • 7
  • 32
  • 45
szs
  • 1
  • 3
    Please provide example input and expected output. – zx8754 Oct 01 '19 at 07:44
  • The data I am working on is huge but here is the summary : my dta.frame1 (customers_info) has 128595 obs and 11 variables while the data.frame2 (transaction_info) has 1324566 obs and 7 variables. I am merging data.frame2 to data.frame1 with the id "customer_id". But it becomes so huge that it doesn't even allocate the memory. – szs Oct 01 '19 at 07:55
  • It seems the key has duplicate values. how to avoid them? – szs Oct 01 '19 at 08:15
  • 1
    first summarise on key values, or perform an update-join, or... or... impossible to answer without sample data. please read: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – Wimpel Oct 01 '19 at 08:17
  • Hello, would love to help but you'll need to be a bit more specific first. Try this page for some advise about how to distill your Q into something more easily communicated: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – MichaelChirico Oct 02 '19 at 06:39

0 Answers0