0

I have two datasets A and B. Both A and B have different variables but two common variable i.e. ID and Date. I want to merge the two using ID and Date. However, when i merge them using code:

 C<-merge(A, B, by = "date", "ID")

The output is C dataset with 0 observations? What is going wrong here?

Update:

I used following script to merge data:

c<-merge(A, B, by = c("date", "ID"), all =  TRUE). 

It merges the data A and B. However, then number of observations double up. Data A has 3733 observations, Data B has 3887 observation. After merging both datasets, total number of observations for data C is 9689. It seems wrong. The total number of observation should be 3000 in number.What is wrong here?

Thanks

  • Thank you it works. However there is another issue. Data A has 3733 obs of 299 variables. Data B has 3887 of 119 variables. When i merge them to data C using date and ID, it gives 9689 observations. It should be happening because dates and IDs are common in both A and B datasets so resulting dataset C should have also observations in 3000 number but it has way more. What could be the reason here? – Hasan Sohail Aug 31 '22 at 09:05
  • I tried all = TRUE, but it remains same 9689 obs of 416 variables. It should be around 3000 in number. what else could be issue here? – Hasan Sohail Aug 31 '22 at 10:04
  • There are probably some with multiple matches and some without a match that get appended because of the `all = TRUE`. You could try using `tidylog`'s joining functions (based on `dplyr`). They can help you figure out why new observations are created. Additionally, it's very difficult to help you without a reproducible example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – harre Aug 31 '22 at 10:44
  • See also: [Why does merge result in more rows than original data?](https://stackoverflow.com/questions/24150765/why-does-merge-result-in-more-rows-than-original-data) – harre Aug 31 '22 at 10:47

0 Answers0