-2

Dataframe (df1) has a column with variable of interest (V1), some of these values in that column correspond to column names in other data frame (df2).

I would need to find an overlap between values (rows) of that column in df1 and all the columns in df2.

    head(df1)
            V1   CHR  MAPINFO    Pval
            a  2   38067017   0.27
            c  2   38070880   0.29
            d  2   38073394   0.00
            e  2   38073443   0.00
            f  2   38073564   0.01

 head(df2)
   a        b           c          d           f

  -0.09      -0.08      -0.50       0.50       0.35
   0.00       0.00       0.40      -0.40      -0.85
   0.32       0.30       0.20       0.74       0.42
  -0.41      -0.52      -0.72      -0.90      -0.96
   1.30       1.30       1.10       1.10       1.20
  -1.12      -1.78      -1.40       1.40       1.20

For example, in the df2, there is no "e" and in df1 there is no "b". How could I only keep the ones that are present both in df1$V1 and all columns of df2?

In the end I would need intersect between both dataframes (values present only in both).

head(df1)
            V1   CHR  MAPINFO    Pval
            a  2   38067017   0.27
            c  2   38070880   0.29
            d  2   38073394   0.00
            f  2   38073564   0.01


head(df2)
   a              c          d           f

  -0.09         -0.50       0.50       0.35
   0.00          0.40      -0.40      -0.85
   0.32          0.20       0.74       0.42
  -0.41         -0.72      -0.90      -0.96
   1.30          1.10       1.10       1.20
  -1.12         -1.40       1.40       1.20

Since the real number of these columns is > ~1200, I can not filter one by one.

Is there another elegant way other than transpose?

Dale K
  • 25,246
  • 15
  • 42
  • 71
Irma
  • 53
  • 5
  • Hey, your question is likely getting downvoted as it does not qualify as a minimum reproducible example along with any attempts to solve the problem. Read this and maybe share more information? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 – Tom Haddow Jan 30 '19 at 14:42
  • Thank you, still learning my way around stackoverflow :) – Irma Jan 30 '19 at 15:55
  • The other big thing I would add is an example of the expected output as this is still not totally clear. From the data you have shared I think the expected output is `df1` but without the row that contains "e" as this is not present as a column name in `df2`? – Tom Haddow Jan 30 '19 at 16:01
  • If I am still unclear, I would need both df1 and df2 but with values that are only present in both. My apologies, hope this is a bit clearer, thanks for the patience tho. – Irma Jan 30 '19 at 16:07
  • Thanks a lot more clear now. I hadn't noticed that sometimes a column in df2 is not present in df1. – Tom Haddow Jan 30 '19 at 16:27
  • My last hope is transpose, and introducing a list, then binding and checking for overlap between each df. :/ Thank you anyway. – Irma Jan 30 '19 at 16:31

1 Answers1

0

Base R solution:

df1 <- subset(df1, df1$V1 %in% names(df2))
df2 <- df2[,df1$V1]
Tom Haddow
  • 230
  • 1
  • 10
  • Thank youuuu!! That is much more elegant than the massacre I've made with transpose,lists etc.... – Irma Jan 30 '19 at 17:06