1

I know, I know.... Another merging Df question, please hear me out as I have searched SO for an answer on this but none has come.

I am merging two Df's, one smaller than the other, and doing a left merge, to match up the longer DF to the smaller DF.

This works well except for one issue, rows get added to the left (smaller) df when the right(longer) df has duplicates.

An Example:

Row<-c("a","b","c","d","e")
Data<-(1:5)
df1<-data.frame(Row,Data)

Row2<-c("a","b","b","c","d","e","f","g","h")
Data2<-(1:9)
df2<-data.frame(Row2,Data2)

names(df2)<-c("Row","Data2")

DATA<-merge(x = df1, y = df2, by = "Row", all.x = TRUE)

>DATA
  Row Data Data2
1   a    1     1
2   b    2     2
3   b    2     3
4   c    3     4
5   d    4     5
6   e    5     6

See the extra "b" row?, that is what I want to get rid of, I want to keep the left DF, but very strictly, as in if there are 5 rows in DF1, when merged I want there to only be 5 rows.

Like this...

  Row Data Data2
1   a    1     1
2   b    2     2
3   c    3     4
4   d    4     5
5   e    5     6

Where it only takes the first match and moves on.

I realize the merge function is only doing its job here, so is there another way to do this to get my expected result? OR is there a post-merge modification that should be done instead.

Thank you for your help and time.

Research: How to join (merge) data frames (inner, outer, left, right)?

deleting duplicates

Merging two data frames with different sizes and missing values

Chabo
  • 2,842
  • 3
  • 17
  • 32

3 Answers3

5

We can use the duplicated function as follows:

DATA[!duplicated(DATA$Row),]

  Row Data Data2
1   a    1     1
2   b    2     2
4   c    3     4
5   d    4     5
6   e    5     6
bouncyball
  • 10,631
  • 19
  • 31
3

It´s possible also like

merge(x = df1, y = df1[unique(df1$Row),], by = "Row", all.x = TRUE)
#  Row Data.x Data.y
#1   a      1      1
#2   b      2      2
#3   c      3      3
#4   d      4      4
#5   e      5      5
Patrik_P
  • 3,066
  • 3
  • 22
  • 39
0

Since you only want the first row and don't care what variables are chosen, then you can use this code (before you merge):

Row2<-c("a","b","b","c","d","e","f","g","h")
Data2<-(1:9)
df2<-data.frame(Row2,Data2)

library(dplyr)

df2 %>%
  group_by(Row2) %>%
  slice(1)
Katie
  • 362
  • 3
  • 14