0

I want to select the column(or make a subset) using the column name in a different dataframe.

e.g.

A dataframe: 3 columns

  • each column name: ab, cd, de

B dataframe: 10 columns

  • each column name: n_ab, n_cd_e, n_de, ab, fg, n_ef, tt, yy, zz, n_a2

I want to make the subset of the B dataframe.

  1. subset C dataframe n_ab, n_cd_e, n_de, ab

  2. subset D dataframe ab

How can I make C and D dataframe?

I expected that I could make the subset B using this code. but, I couldn't. Because the contains() only can make the subset by letter.

3) How can select the column(or make the subset) using the condition(like >= , %in% , == etc.)?

ge<-select(ge.n, contains('ge'))

Thanks

JJJ
  • 21
  • 3
  • 1
    welcome to `stackoverflow`, please edit your question to include your code and sample in proper format. – Ed_Gravy Aug 03 '22 at 12:24
  • 2
    Please take a read of [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to ask a good question. Currently it's unclear what you're asking. – user438383 Aug 03 '22 at 12:26

1 Answers1

1

To create C you can use grepl with an OR pattern derived from the elements on A's names.

C = B[, grepl(paste0(names(A), collapse="|"), names(B)), drop=F]

To create D you can use %in% directly.

D = B[, names(B) %in% names(A), drop=F]

Outputs (C and D, respectively):

        n_ab     n_cd_e       n_de         ab
1 -0.4456620  0.4007715  1.7869131  0.7013559
2  1.2240818  0.1106827  0.4978505 -0.4727914
3  0.3598138 -0.5558411 -1.9666172 -1.0678237


          ab
1  0.7013559
2 -0.4727914
3 -1.0678237

Inputs:

set.seed(123)
A = setNames(as.data.frame(
  replicate(3,rnorm(3))),  c("ab","cd","de")
)
B = setNames(as.data.frame(
  replicate(10,rnorm(3))),  c("n_ab", "n_cd_e", "n_de", "ab", "fg", "n_ef", "tt", "yy", "zz", "n_a2")
)
langtang
  • 22,248
  • 1
  • 12
  • 27