Join two dataframe based on common column in R

Question

I would like to join two data frames (df1, df2) based on the common first column, however, I want the placement of each column from both data frames one after another as shown in the example:

INPUT df1
df1 <- data.frame(name = c("JMA", "PSA", "TAN", "POS"), HAN = c("0.2119", "-0.1726", "0.0002", "-1.036"), ENA = c("0.0736", "-1.6644", "0.0743", "-0.0605"), IUA = c("-0.2212", "-0.3352", "0.6301", "-0.2848"), MEA = c("0.3464", "0.4085", "0.5801", "0.3236"))

Input df2
df2 <- data.frame(name = c("JMA", "PSA", "TAN", "POS"), YHD = c("0.3874", "-0.209", "-0.0614", "-1.8067"), PUH = c("0.0784", "-0.572", "0.0819", "-0.0628"), OUR = c("0.2297", "0.279", "0.3106", "0.3368" ), NKA = c("-0.5446", "-2.324", "-0.572", "-0.3626"))


Output df
df <- data.frame(name = c("JMA", "PSA", "TAN", "POS"), HAN = c(0.2119, -0.1726, 2e-04, -1.036), YHD = c(0.3874, -0.209, -0.0614, -1.8067 ), ENA = c(0.0736, -1.6644, 0.0743, -0.0605), PUH = c(0.0784, -0.572, 0.0819, -0.0628), IUA = c(-0.2212, -0.3352, 0.6301, -0.2848 ), OUR = c(0.2297, 0.279, 0.3106, 0.3368), MEA = c(0.3464, 0.4085, 0.5801, 0.3236), NKA = c(-0.5446, -2.324, -0.572, -0.3626))

I know the join function but do not know how to place columns from both dataframe one by one for further analysis.

Please help. Thank you in advance. :)

`df1` and `df2` are identical, and many columns in `df2` are completely new. Please fix your sample data. — r2evans, Oct 07 '21 at 15:00
@r2evans Done, I am sorry how it slipped even after checking. Please look into the problem if you can help me. Thank you in advance. I just want the alternate placement of column like if df1 have column "A", "B", "C" and df2 column "D", "E", "F" then after merge I want "A", "D", "B", "E", "C", "F" order. — SUMIT, Oct 07 '21 at 15:14
How is the NIS row supposed to be derived? Perhaps all you need is to read `?cbind`? (I'm guessing there is a bind_cols or col_bind function in the tidyverse fork of R, but `cbind` is base R.) — IRTFM, Oct 07 '21 at 15:20
`full_join(df1, df2, by = "name")` is working, albeit (1) different column order, and (2) `df[5,]` which is new data. — r2evans, Oct 07 '21 at 15:24
@IRTFM Thank you for your help. That NIS row is wrong you can ignore see the updated question. Thanks — SUMIT, Oct 07 '21 at 15:26
gupta, I've marked this as a dupe of a couple of other questions that have great answers walking through various ways of joining (both within tidyverse and others). If that really doesn't work for you (despite `full_join(df1,df2)` working), please pipe up with clear code/results showing what's wrong with it. Otherwise, feel free to accept @neuron's answer, you can accept even when marked as a dupe. Thanks! — r2evans, Oct 07 '21 at 15:29
@r2evans I know what neuron did also like full_join or left-right join rbind cbind etc, but I want the placement of columns from both data frame one by one. Please look into the question again, I corrected the input and output data, your suggested question just talk about joining instead placement location of columns, I need help urgently. Please open the question. Thanks — SUMIT, Oct 07 '21 at 15:32

neuron · Answer 1 · 2021-10-07T15:26:43.567

1

Your desired df has 5 rows vs your two starting dataframes have 4 rows. I assume that is just a mistake.

This code should work to merge two dataframes based on a common column

df <- merge(df1,df2, by = "name")
df
  name     HAN     ENA     IUA    MEA     YHD     PUH    OUR     NKA
1  JMA  0.2119  0.0736 -0.2212 0.3464  0.3874  0.0784 0.2297 -0.5446
2  POS  -1.036 -0.0605 -0.2848 0.3236 -1.8067 -0.0628 0.3368 -0.3626
3  PSA -0.1726 -1.6644 -0.3352 0.4085  -0.209  -0.572  0.279  -2.324
4  TAN  0.0002  0.0743  0.6301 0.5801 -0.0614  0.0819 0.3106  -0.572

edited Oct 07 '21 at 15:26

answered Oct 07 '21 at 15:25

neuron

1,949
1
15
30

Yes it was by mistake I am correcting it. – SUMIT Oct 07 '21 at 15:26
I know this but I want placement of column from both dataframe one by one. Thanks – SUMIT Oct 07 '21 at 15:28

r2evans · Accepted Answer · 2021-10-07T15:41:38.827

1

library(dplyr)
df3 <- full_join(df1, df2, by="name")
cn3 <- colnames(df3)
df3 <- df3[,order(coalesce(match(cn3, names(df1)), match(cn3, names(df2))))]
identical(df, df3)
# [1] TRUE
df3
#   name     HAN     YHD     ENA     PUH     IUA    OUR    MEA     NKA
# 1  JMA  0.2119  0.3874  0.0736  0.0784 -0.2212 0.2297 0.3464 -0.5446
# 2  PSA -0.1726 -0.2090 -1.6644 -0.5720 -0.3352 0.2790 0.4085 -2.3240
# 3  TAN  0.0002 -0.0614  0.0743  0.0819  0.6301 0.3106 0.5801 -0.5720
# 4  POS -1.0360 -1.8067 -0.0605 -0.0628 -0.2848 0.3368 0.3236 -0.3626

edited Oct 07 '21 at 15:41

answered Oct 07 '21 at 15:37

r2evans

141,215
6
77
149

1

Thank you dear for your help. Actually, I was searching the solution for this problem from last 2 hours. Your answer is really helpful. Thanks a lot. :) God bless you – SUMIT Oct 07 '21 at 15:40

score 1 · Answer 3 · answered Oct 07 '21 at 16:13

1

We may use join from data.table

library(data.table)
setDT(df1)[df2, on = .(name)]
   name     HAN     ENA     IUA    MEA     YHD     PUH    OUR     NKA
1:  JMA  0.2119  0.0736 -0.2212 0.3464  0.3874  0.0784 0.2297 -0.5446
2:  PSA -0.1726 -1.6644 -0.3352 0.4085  -0.209  -0.572  0.279  -2.324
3:  TAN  0.0002  0.0743  0.6301 0.5801 -0.0614  0.0819 0.3106  -0.572
4:  POS  -1.036 -0.0605 -0.2848 0.3236 -1.8067 -0.0628 0.3368 -0.3626

answered Oct 07 '21 at 16:13

akrun

874,273
37
540
662

1

Thank you dear for your help. Thanks a lot. :) – SUMIT Oct 07 '21 at 20:10
1

@gupta I guess the data.table should be fast – akrun Oct 07 '21 at 20:10
1

I see, Thanks alot – SUMIT Oct 07 '21 at 20:11

Join two dataframe based on common column in R

3 Answers3