match two files based on one column with duplicate names

Question

I have specified two vector in R : dif and df2

dif

            TX_NAME   baseMean log2FoldChange    lfcSE      stat       pvalue
1  ENSMUST00000189941.1 2924.12770      -11.52662 1.225415 -9.406295 5.139318e-21
2  ENSMUST00000174759.7   87.20515      -22.23962 2.848984 -7.806160 5.895654e-15
3  ENSMUST00000202220.3 1858.64629      -13.83620 1.769124 -7.820928 5.243522e-15
4 ENSMUST00000064151.12   81.87098      -22.15462 2.849401 -7.775185 7.533750e-15
5  ENSMUST00000139264.1  100.04720      -22.42838 2.851911 -7.864335 3.710619e-15
6  ENSMUST00000080115.9   84.68359      -22.20991 2.848771 -7.796313 6.374197e-15
          padj
1 6.380052e-16
2 8.319140e-11
3 8.319140e-11
4 8.319140e-11
5 8.319140e-11
6 8.319140e-11

and

df2

TX_NAME          NAME                    
1: ENSMUST00000193812.1 RP23-271O17.1                  
2: ENSMUST00000082908.1       Gm26206                
3: ENSMUST00000162897.1          Xkr4 
4: ENSMUST00000159265.1          Xkr4 
5: ENSMUST00000070533.4          Xkr4       
6: ENSMUST00000192857.1 RP23-317L18.1

I don't know how to match them based on TX_NAME and NAME, and have dif with TX_NAME and it's related NAME from df2 I can't merge them because NAME in df2 has duplicates

I'm not sure why having duplicates would prevent your from merging. What exactly is the output you desire? Also, it's better to share your data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — MrFlick, Jan 11 '19 at 21:29
I was confused cause the order of TX_NAME after merge was not same. but with your command seems it's Ok now. thanks — LKian, Jan 11 '19 at 22:13

score 0 · Accepted Answer · answered Jan 11 '19 at 21:45

A more detailed discussion about data frame merge is made here. If you want to check the documentation, access Merging Data or Merge Two Data Frames.

So, you can make the merge as:

merge(dif, df2, by="TX_NAME")

But, you will lost the lines that doesn't appear in both data frame. So, if you want fill the blank cells with NA, you can use it:

merge(dif, df2, by="TX_NAME", all=TRUE)

Good work!

match two files based on one column with duplicate names

1 Answers1