0

I have two ExpressionSets that I'd like to merge into one called exprs.br.ov, including values across all samples for each gene (all genes are not present in both sets).

The first is exprs(br.samp) which contains 48107 rows (genes) and 3 columns (samples). The second is exprs(ov.samp) which is 49576 rows and 6 columns.

I've tried (takes several minutes on my laptop):

exprs.br.ov <- merge(exprs(br.samp), exprs(ov.samp))

I can send the full data sets but here is a sample of what the two sets look like individually:

exprs(br.samp)[1000:1005,]
             GSM1686435 GSM1686439 GSM1686443
ILMN_1652079   2.598251   2.691751   2.660744
ILMN_1652081   2.615129   2.750116   2.692110
ILMN_1652082   3.355115   3.349804   3.359563
ILMN_1652085   3.356552   3.293744   3.416394
ILMN_1652088   2.604641   2.634033   2.705018
ILMN_1652098   2.636708   2.681400   2.668621


exprs(ov.samp)[1:5,]
             GSM780707 GSM780708 GSM780709 GSM780719 GSM780720 GSM780721
ILMN_1725881  5.844604  6.117963  5.894689  5.587485  5.808352  5.928565
ILMN_1910180  6.264897  5.767562  5.736104  6.449061  5.841978  5.651918
ILMN_1804174  5.568391  5.232546  5.788832  5.641904  5.392946  5.632815
ILMN_1796063 10.592653 10.549996 10.209368 10.702580 10.630577 10.485648
ILMN_1811966  6.183197  6.231567  6.173843  6.142019  6.120883  5.966730

I'd like to only include genes that are present in both sets, with 9 columns for each of the samples.

The result from the merge() function seems to just return a vector of gene names and not the expression values for each sample.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 1
    What is the name of the package used? – Jonny Phelps Jun 18 '19 at 14:48
  • I do not know what ExpressionSets are, but if you can make them into data frames you can just use `full_join()`, `left_join()` or `right_join()` from `dplyr` – Bryan Adams Jun 18 '19 at 15:44
  • As @ahmad pointed out in his answer, it is best to specify which column you join on. Here your id is defined as row.names thus your problem. Using a function like `dput` on your expressions would allow us to give you quick complete answers. You can look here for ways to create reproducible exemples [so_reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – cbo Jun 18 '19 at 15:58

1 Answers1

1

save the first array data as g1

g1=exprs(br.samp)
g1=data.frame(g1)
g1$id=rownames(g1)

save the second array data as g2

g2=exprs(ov.samp)
g2=data.frame(g2)
g2$id=rownames(g2)

if the gene IDs are common between array 1 and 2, you could merge them by column "id"

mrg1=merge.data.frame(x = g1,y = g2,by = "id",all = F)#keep only common genes

mrg2=merge.data.frame(x = g1,y = g2,by = "id",all = T)#keep all genes
ahmad
  • 378
  • 1
  • 7