Add column based on primary key

Question

I have two tables where the 'ENSG' and 'Ensembl Gene ID' columns are primary keys. I'm interested in combining the average raw counts column to the first table based on matching primary keys.

Condition	ENSG	avg_rep
Untreated	ENSG00000281593	8.702268e-06
Treated	ENSG00000280987	5.444327e-06

Ensembl gene id	average raw counts
ENSG00000281593	3.635470e-04
ENSG00000280987	4.558828e-04

The end result looks something like this:

Condition	ENSG	avg_rep	average raw counts
Untreated	ENSG00000281593	8.702268e-06	3.635470e-04
Treated	ENSG00000280987	5.444327e-06	4.558828e-04

Ricardo Semião e Castro · Answer 1 · 2021-06-02T00:26:25.637

1

Using merge:

merge(df1, df2, by.x="ENSG", by.y="Ensemble gene id")

Or, as mentioned by Onyambu, using full_join:

full_join(df1, df2, c('ENSG'='Ensemble gene id'))

Where df1 is your first table, and df2 the second one

edited Jun 02 '21 at 00:26

answered Jun 02 '21 at 00:18

Ricardo Semião e Castro

4,366
1
8
27

Also include the full join code – Onyambu Jun 02 '21 at 00:19
What do you mean? @Onyambu – Ricardo Semião e Castro Jun 02 '21 at 00:22
1

`full_join(df1, df2, c('ENSG'='Ensemble gene id')` – Onyambu Jun 02 '21 at 00:23

score 1 · Answer 2 · answered Jun 02 '21 at 00:23

I think left_join() might do the trick.

library(tidyverse)

data_left <- data.frame(
  condition = c('untreated', 'treated'),
  ensg = c('a123', 'b456'),
  avg_rep = c(1,2)
)

data_right <- data.frame(
  ensg_gene_id = c('a123', 'b456'),
  average_raw_count = c(4,5)
)

result <- data_left %>%
  left_join(
    data_right,
    by = c('ensg' = 'ensg_gene_id')
  )

Add column based on primary key

2 Answers2