Overlap between values in a row in one dataframe and columns in other dataframe

Question

Dataframe (df1) has a column with variable of interest (V1), some of these values in that column correspond to column names in other data frame (df2).

I would need to find an overlap between values (rows) of that column in df1 and all the columns in df2.

    head(df1)
            V1   CHR  MAPINFO    Pval
            a  2   38067017   0.27
            c  2   38070880   0.29
            d  2   38073394   0.00
            e  2   38073443   0.00
            f  2   38073564   0.01

 head(df2)
   a        b           c          d           f

  -0.09      -0.08      -0.50       0.50       0.35
   0.00       0.00       0.40      -0.40      -0.85
   0.32       0.30       0.20       0.74       0.42
  -0.41      -0.52      -0.72      -0.90      -0.96
   1.30       1.30       1.10       1.10       1.20
  -1.12      -1.78      -1.40       1.40       1.20

For example, in the df2, there is no "e" and in df1 there is no "b". How could I only keep the ones that are present both in df1$V1 and all columns of df2?

In the end I would need intersect between both dataframes (values present only in both).

head(df1)
            V1   CHR  MAPINFO    Pval
            a  2   38067017   0.27
            c  2   38070880   0.29
            d  2   38073394   0.00
            f  2   38073564   0.01


head(df2)
   a              c          d           f

  -0.09         -0.50       0.50       0.35
   0.00          0.40      -0.40      -0.85
   0.32          0.20       0.74       0.42
  -0.41         -0.72      -0.90      -0.96
   1.30          1.10       1.10       1.20
  -1.12         -1.40       1.40       1.20

Since the real number of these columns is > ~1200, I can not filter one by one.

Is there another elegant way other than transpose?

Hey, your question is likely getting downvoted as it does not qualify as a minimum reproducible example along with any attempts to solve the problem. Read this and maybe share more information? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 — Tom Haddow, Jan 30 '19 at 14:42
The other big thing I would add is an example of the expected output as this is still not totally clear. From the data you have shared I think the expected output is `df1` but without the row that contains "e" as this is not present as a column name in `df2`? — Tom Haddow, Jan 30 '19 at 16:01
If I am still unclear, I would need both df1 and df2 but with values that are only present in both. My apologies, hope this is a bit clearer, thanks for the patience tho. — Irma, Jan 30 '19 at 16:07
Thanks a lot more clear now. I hadn't noticed that sometimes a column in df2 is not present in df1. — Tom Haddow, Jan 30 '19 at 16:27
My last hope is transpose, and introducing a list, then binding and checking for overlap between each df. :/ Thank you anyway. — Irma, Jan 30 '19 at 16:31

score 0 · Accepted Answer · answered Jan 30 '19 at 16:46

0

Base R solution:

df1 <- subset(df1, df1$V1 %in% names(df2))
df2 <- df2[,df1$V1]

answered Jan 30 '19 at 16:46

Tom Haddow

230
1
10

Thank youuuu!! That is much more elegant than the massacre I've made with transpose,lists etc.... – Irma Jan 30 '19 at 17:06

Overlap between values in a row in one dataframe and columns in other dataframe

1 Answers1