1

I have two large data frames df1 and df2. I just want to keep only those columns with df2 which match the first part of column names ( which represent companies names) of df1. That means I want to subset df2 by matching companies name from df1. The files are like this :

df1

date       Com1 - PI    Com3 - PI   Com6 - PI
1/1/2007    292782.4    101780.7    2406010
1/2/2007    292782.4    101780.7    2406010
1/3/2007    292782.4    101780.7    2406010

df2

date        Com1 - CV  Com2- CV  Com3 - CV   Com4 - CV   Com5- CV    Com6 - CV
6/6/1990    1.57         NA        5.3         NA          NA          4.51
6/7/1990    1.57         NA        5.2         NA          NA          4.51
6/8/1990    1.59         NA        5.4         NA          NA          4.51
6/11/1990   1.59         NA        5.2         NA          NA          4.51

Expected output (which is a subset of df2)

date      Com1 - CV     Com3 - CV       Com6 - CV
6/6/1990    1.57           5.3            4.51
6/7/1990    1.57           5.2            4.51
6/8/1990    1.59           5.4            4.51
6/11/1990   1.59           5.2            4.51

As a new learner of r , I would be grateful if you can help me in this regard.

Samima
  • 73
  • 6
  • 1
    Please provide your data in an easy-to-paste form using e.g. `dput` (see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)). – Roman Luštrik Mar 24 '18 at 09:09
  • yes, it only creating df with column names row without other rows. Thank u though.@AaghazHussain – Samima Mar 24 '18 at 09:25

1 Answers1

1

Remove the part after the first word with regex and then use %in% to get an exact match

df2[sub("\\s*-\\s*.*", "", names(df2)) %in% sub("\\s*-\\s*.*", "", names(df1))]
#       date Com1 - CV Com3 - CV Com6 - CV
#1  6/6/1990      1.57       5.3      4.51
#2  6/7/1990      1.57       5.2      4.51
#3  6/8/1990      1.59       5.4      4.51
#4 6/11/1990      1.59       5.2      4.51
akrun
  • 874,273
  • 37
  • 540
  • 662