Merge csv files in R

Question

I have 3 .csv files that I need to analyse in R. File one contains columns with user id and signupdate. File two contains columns with user id, purchase date and amount of purchases. File three contains columns with user id, message date and number of messages.

Please note that the order of the user id is not the same in each of the three files, thus cop.

Would love some help merging these files so that the large dataset has order user id, signupdate, purchase date, amount of purchases, message date and number of messages. Can't seem to find code to do this in R Thanks in advance

I'm assuming you know how to read them in separately. It would be a good idea to post some [sample data](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so it would be easier to offer specific suggestions. — MrFlick, Aug 21 '14 at 02:30
@MatthewLundberg: Thanks for correcting. One might do: df=merge(df1,df2); df=merge(df,df3) — rnso, Aug 21 '14 at 03:09

Matthew Lundberg · Answer 1 · 2014-08-21T02:49:54.463

While merge doesn't take three arguments, Reduce is made for the task of iterating over a list and passing pairs to a function. Here's an example of a three-way merge:

d1 <- data.frame(id=letters[1:3], x=2:4)
d2 <- data.frame(id=letters[3:1], y=5:7)
d3 <- data.frame(id=c('b', 'c', 'a'), z=c(5,6,8))
Reduce(merge, list(d1, d2, d3))
##   id x y z
## 1  a 2 7 8
## 2  b 3 6 5
## 3  c 4 5 6

Note that the order of the column id is not the same, but the values are matched.

In the case where you have non-matching data and want to keep all possible rows, you need an outer join, by supplying all=TRUE to merge. As Reduce does not have a way to pass additional arguments to the function, a new function must be created to call merge:

d1 <- data.frame(id=letters[1:3], x=2:4)
d2 <- data.frame(id=letters[3:1], y=5:7)
d3 <- data.frame(id=c('b', 'c', 'd'), z=c(5,6,8))
Reduce(function(x,y) merge(x,y,all=TRUE), list(d1, d2, d3))
##   id  x  y  z
## 1  a  2  7 NA
## 2  b  3  6  5
## 3  c  4  5  6
## 4  d NA NA  8

NA is used to indicate data in non-matched rows.

Hi, I just played with your sample and have a question. I changed one of the ids in d3 from a to d and ran the script. I did not see rows for id a and d in the outcome. If you want to preserve all ids in the outcome, what could we do? — jazzurro, Aug 21 '14 at 02:43

Merge csv files in R

1 Answers1