0

I have specified two vector in R : dif and df2

dif

            TX_NAME   baseMean log2FoldChange    lfcSE      stat       pvalue
1  ENSMUST00000189941.1 2924.12770      -11.52662 1.225415 -9.406295 5.139318e-21
2  ENSMUST00000174759.7   87.20515      -22.23962 2.848984 -7.806160 5.895654e-15
3  ENSMUST00000202220.3 1858.64629      -13.83620 1.769124 -7.820928 5.243522e-15
4 ENSMUST00000064151.12   81.87098      -22.15462 2.849401 -7.775185 7.533750e-15
5  ENSMUST00000139264.1  100.04720      -22.42838 2.851911 -7.864335 3.710619e-15
6  ENSMUST00000080115.9   84.68359      -22.20991 2.848771 -7.796313 6.374197e-15
          padj
1 6.380052e-16
2 8.319140e-11
3 8.319140e-11
4 8.319140e-11
5 8.319140e-11
6 8.319140e-11

and

df2

TX_NAME          NAME                    
1: ENSMUST00000193812.1 RP23-271O17.1                  
2: ENSMUST00000082908.1       Gm26206                
3: ENSMUST00000162897.1          Xkr4 
4: ENSMUST00000159265.1          Xkr4 
5: ENSMUST00000070533.4          Xkr4       
6: ENSMUST00000192857.1 RP23-317L18.1 

I don't know how to match them based on TX_NAME and NAME, and have dif with TX_NAME and it's related NAME from df2 I can't merge them because NAME in df2 has duplicates

pogibas
  • 27,303
  • 19
  • 84
  • 117
LKian
  • 15
  • 4
  • 1
    have you tried `merge(df, df2, "TX_NAME")`? – pogibas Jan 11 '19 at 21:25
  • I'm not sure why having duplicates would prevent your from merging. What exactly is the output you desire? Also, it's better to share your data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – MrFlick Jan 11 '19 at 21:29
  • I was confused cause the order of TX_NAME after merge was not same. but with your command seems it's Ok now. thanks – LKian Jan 11 '19 at 22:13

1 Answers1

0

A more detailed discussion about data frame merge is made here. If you want to check the documentation, access Merging Data or Merge Two Data Frames.

So, you can make the merge as:

merge(dif, df2, by="TX_NAME")

But, you will lost the lines that doesn't appear in both data frame. So, if you want fill the blank cells with NA, you can use it:

merge(dif, df2, by="TX_NAME", all=TRUE)

Good work!