select text from multiple combinations of text within a dataframe R

Question

I want to subset data based on a text code that is used in numerous combinations throughout one column of a df. I checked first all the variations by creating a table.

 list <-  as.data.frame(table(EQP$col1))

I want to search within the dataframe for the text "EFC" (even when combined with other letters) and subset these rows so that I have a resultant dataframe that looks like this.

I have looked through this question here, but this does not answer the question. I have reviewed the tidytext package, but this does not seem to be the solution either.

How to Extract keywords from a Data Frame in R

Are you trying to just subset the rows or to do something else as well? Can you provide a [working example](https://stackoverflow.com/help/minimal-reproducible-example)? — Gallarus, Feb 04 '20 at 20:14
just subset the rows that have "EFC" (in any combination) in column 1 — sar, Feb 04 '20 at 20:21

Gallarus · Accepted Answer · 2020-02-04T20:38:13.023

1

You can simply use grepl.

Considering your data.frame is called df and the column to subset on is col1

df <- data.frame(
    col1 = c("eraEFC", "dfs", "asdj, aslkj", "dlja,EFC,:LJ)"),
    stringsAsFactors = F
)

df[grepl("EFC", df$col1), , drop = F]

edited Feb 04 '20 at 20:38

answered Feb 04 '20 at 20:29

Gallarus

476
3
9

Hi. I tried this - package ‘grepl’ is not available (for R version 3.6.1) – sar Feb 04 '20 at 20:31
`grepl` is not a package, it is a base R function. What did you try exactly? – Gallarus Feb 04 '20 at 20:33

score 1 · Answer 2 · answered Feb 04 '20 at 20:45

1

Another option besides the mentioned solution by Gallarus would be:

library(stringr)
library(dplyr)
df %>% filter(str_detect(Var1, "EFC"))

As described by Sam Firke in this post:

Selecting rows where a column has a string like 'hsa..' (partial string match)

answered Feb 04 '20 at 20:45

maarvd

1,254
1
4
14

What if you wanted to include more than one "text", df %>% filter(str_detect(Var1, "EFC", "ADE")) doesnt work... – Ecg Nov 02 '20 at 14:15
1

Something like df %>% filter(str_detect(Var1, pattern = "EFC|ADE")) would work (if you want to return the df containing either of those partial strings. – maarvd Nov 03 '20 at 14:29

select text from multiple combinations of text within a dataframe R

2 Answers2