I have two dataframes
df1
names target start end
Gene_1 chr5 1 345
Gene_2 chr1 1 678
Gene_3 chr4 1 909
Gene_4 chr48 1 876
Gene_5 chr8 1 432
Gene_6 chr9 1 556
Gene_7 chr12 1 345
df2
gene_names positions
Gene_1 221
Gene_2 34
Gene_2 444
Gene_2 324
Gene_3 99
Gene_3 232
Gene_4 221
Gene_4 334
Gene_4 390
Gene_6 200
Gene_7 146
df1
is way shorter than df2
.
The first column of df2
has repeated observations with different values in the second column. Its first column lacks some match (a lot) of the observations in the homologous column in df1
.
I wanted to merge them into a df_new
, which contains the gene_names
and the other column from df2
together with the related information from the other column of df1
, even repeated when the observation in gene_names
appears 2+ times.
I paved my way with merge
df_new<-merge(df2, df1, by.x = "gene_names", by.y = "names")
and I have a tentatively result of which I am mostly unsure. Someone can shed more light?
Intended output
df_new
gene_names positions target start end
Gene_1 221 chr5 1 345
Gene_2 34 chr1 1 678
Gene_2 444 chr1 1 678
Gene_2 324 chr1 1 678
Gene_3 99 chr4 1 909
Gene_3 232 chr4 1 909
Gene_4 221 chr48 1 876
Gene_4 334 chr48 1 876
Gene_4 390 chr48 1 876
Gene_6 200 chr9 1 556
Gene_7 146 chr12 1 345