-1

I am looking to merge two data frames. Both tables have a "Name" column, and I wish to use this column as the one to connect the two. However, when I run something like:

final <- merge(df1, df2, by = "Name")

All of the columns are merged, but I only get the data from df1. And if I run something like:

final <- merge(df1, df2, by = "Name", all.y = TRUE)

The merged table just gives all of the data from df2 but excludes data from df1.

Is there a way around this?

#reproducible example

Names <- c("John", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)

df1 <- data.frame(Names, Age)

Names <- c("John", "Bill", "Maria", "Ben", "Tina")
Cars<- c("Ford", "Kia", "VW", "Toyota", "SAAB")
Color<- c("Red", "Green", "Blue", "Orange", "White")

df2<-data.frame(Names, Cars, Color)

final<- merge( df1, df2, by="Names")

#Expected Result

view(final)
Names Age Cars Color 
John  23  Ford Red 
Bill  41  Kia  Green
Maria 32  VW   Blue 
Ben   58  Toyota Orange 
Tina  26  SAAB   White 

#Actual Result

Names Age Cars Color 
John  NA  Ford Red 
Bill  NA  Kia  Green
Maria NA  VW   Blue 
Ben   NA  Toyota Orange 
Tina  NA  SAAB   White

In this example, I would get the data from df2, but df1 values would show NA.

Is there any way around this?

MrFlick
  • 195,160
  • 17
  • 277
  • 295
natguy8
  • 41
  • 6
  • 5
    Hard to say without a reproducible example but try `merge(df1, df2, by = "Name", all.x = TRUE, all.y = TRUE)` – SamR Jun 29 '22 at 11:25
  • @SamR I have tried this, the output is NA results from df1 and all the data from df2. I am not sure why this is happening, all of my other merges have worked. Anything else I can try? – natguy8 Jun 29 '22 at 11:47
  • 2
    It is hard to help without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – SamR Jun 29 '22 at 11:49
  • Please provide `dput(head(df1))` and `dput(head(df2))` – Waldi Jun 29 '22 at 11:54
  • @SamR I added a reproducible example in the question! – natguy8 Jun 29 '22 at 12:02
  • Can you add what you expect your final output to be? You are merging a 5x2 with a 5x2 data frame on one column and getting a 5x3 data frame which is what I would expect. – SamR Jun 29 '22 at 12:07
  • @SamR final output is added as well! – natguy8 Jun 29 '22 at 12:17
  • 1
    I cannot reproduce the problem with the code you provided. If I run the code posted I already get the the desired output. Thus this doesn't exactly reproduce the problem. The problem is still likely with your actual data. – MrFlick Jun 29 '22 at 13:09
  • 1
    Thanks for updating the data since I last commented but like @MrFlick I am also getting the desired output when I run your code. I'm afraid this question is getting to be a bit of a mess now. – SamR Jun 29 '22 at 13:25

3 Answers3

1

your example works for me with little changes

library(tidyverse)

df1 <- tibble(
  name=c("John", "Bill", "Maria", "Ben", "Tina"), 
  age=c(23, 41, 32, 58, 26))

df2<-tibble(
  name=c("John", "Bill", "Maria", "Ben", "Tina"), 
  cars=c("Ford", "Kia", "VW", "Toyota", "SAAB"), 
  color=c("Red", "Green", "Blue", "Orange", "White"))

merge( df1, df2, by="name")

# 
   name age   cars  color
1   Ben  58 Toyota Orange
2  Bill  41    Kia  Green
3  John  23   Ford    Red
4 Maria  32     VW   Blue
5  Tina  26   SAAB  White

Here I use tidyverse cause it's kind of a standard alternative to data.frames. I'm stick with using lowercased column names -- one more code standard.

I couldn't reproduce the example because of the Color varible with wrong quotes. Could it be the cause?

SamR
  • 8,826
  • 3
  • 11
  • 33
Sergey Skripko
  • 336
  • 1
  • 8
  • I don't believe the quotes are the issue. Maybe it has something to do with the variables in df1. Because when I troubleshoot with different options, I usually get all of the data back from df2. – natguy8 Jun 29 '22 at 13:19
  • does my example work in your R studio? – Sergey Skripko Jun 29 '22 at 13:22
0

You could try using dplyr package, which is great for managing dataframes:

final <- full_join(df1, df2, by= "Name")

If the names of the columns are different just use: by= c("Name1" = "Name2")

If you only want to drop the rows that are not common you can have a look on right_join/left_join/inner_join

  • I have tried this, the output is NA results from df1 and all the data from df2. I am not sure why this is happening, all of my other merges have worked. Anything else I can try? – natguy8 Jun 29 '22 at 11:47
  • You should add a small piece of your data in your question, so anyone can try to reproduce the error. You could build two mock data frames as I did in this question: https://stackoverflow.com/questions/72800568/compute-mean-on-data-frame-columns-for-specific-rows-selected-from-list-elements/72800922#72800922 – Pau Clavell-Revelles Jun 29 '22 at 11:52
  • I added a reproducible example in the question! – natguy8 Jun 29 '22 at 12:03
  • 1
    Your example works just fine in my session, try adding `base::merge` just in case there is an overlap between packages. – Pau Clavell-Revelles Jun 29 '22 at 16:22
0

besides all.x and all.y there is also an all argument for merge.

Using your code: final <- merge(df1, df2, by = "Name", all = TRUE) should work.

The RDocumentation also explains how the arguments work for merge

Omniswitcher
  • 368
  • 3
  • 13
  • I tried all=TRUE as well. That didn’t seem to do the trick. I added expected output and actual output to the question. – natguy8 Jun 29 '22 at 12:18