Join 2 dataframes, while keeping exact left table

Question

I want to join two tables (timeseries) while keeping every left entry (and NOT adding ones from the right) and adding things from the right only when they exist left and right. Kind of a left inner join (which probably does not exist).

For example I wanna join A with B and want to get AB:

2 and 3 are NaN because 2 exists multiple times and 3 does not exist on the right side

Thank you for the help

Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. — Jaap, Sep 26 '16 at 15:15
Furthermore, you could start reading this Q&A: [How to join (merge) data frames (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) — Jaap, Sep 26 '16 at 15:16
Hey, I read all of the join and merge Q&As and couldnt find the answer. — Squeezie, Sep 26 '16 at 15:36

Pierre L · Answer 1 · 2016-09-26T16:07:09.020

You mention 'time-series' data in your question. That type of data can come in a general table format or a specific R ts object. The ts class is a special type of vector or matrix given special properties to assist in time-series analysis. We will assume for this approach that you have the latter case, an object of class(A) [1] "data.frame".

This is less a merge than a conditional match. We can test if there is one and only one match of the A values in B:

indx <- sapply(A$col1, function(x) sum(B$col1 %in% x) == 1L)
data.frame(v1=A$col1, v2=ifelse(indx, B$col2, NaN), stringsAsFactors=FALSE)
#   v1  v2
# 1  1   X
# 2  2 NaN
# 3  3 NaN
# 4  4   Z

Note: Be sure to not fall into the factor trap. R automatically codes character values into class factor not into class character as you might expect at first. To avoid this issue use stringsAsFactors=FALSE,

#Data
A <- data.frame(col1=1:4L)
B <- data.frame(col1=c(1,2,2,4), col2=c("X", "Y", "Z", "Z"), stringsAsFactors=FALSE)

Tensibai · Accepted Answer · 2016-09-26T16:18:04.083

1

Another approach with cleanup and merge (left join) (probably far less efficient than @Pierre answer):

A <- structure(list(id = 1:4), .Names = "id", row.names = c(NA, -4L ), class = "data.frame")
B <- structure(list(id = c("1", "2", "2", "4", "5"), val = c("X", "Y", "Z", "Z", "ZZ")), .Names = c("id", "val"), row.names = c(NA, 5L), class = "data.frame")

# Cleanup duplicated id in B (could be used in the merge to avoid creating a new object, done here for ease of reading)
NB <- B[!B$id %in% B$id[duplicated(B$id)],]

# left merge of A and B on id
merge(A,NB,all.x=TRUE,by="id")

Gives:

  id  val
1  1    X
2  2 <NA>
3  3 <NA>
4  4    Z

edited Sep 26 '16 at 16:18

answered Sep 26 '16 at 16:07

Tensibai

15,557
1
37
57

Hey, I edited my question. it can happen, that B is longer then A, then I think we cant use ur answer. – Squeezie Sep 26 '16 at 16:15
@Squeezie you're thinking wrong. with edited data it still gives 4 rows (I.e: the same number of rows than A) – Tensibai Sep 26 '16 at 16:17
It perfectly did the job it should. Thank you! – Squeezie Sep 28 '16 at 20:09
@Squeezie glad it helped, but you should really try pierre's answer. I'm pretty sure it will be quicker and more efficient – Tensibai Sep 28 '16 at 20:13
Actually yours did the work in under 1 second for a ~300.000 length vector, while pierres took over 1 minute (if I did it right). – Squeezie Sep 28 '16 at 20:28

Join 2 dataframes, while keeping exact left table

2 Answers2