0

I want to search my dataset for those values that some attributes from multiple columns.

For that, I found that I can use grep like so:

df <- read.csv('example.csv', header = TRUE, sep='\t')

df[grep("region+druggable", df$locus_type=="region", df$drug_binary==1),]

But when I run this, my output is the different column names. Why is this happening?

my dataframe is like this:
id    locus_type     drug_binary
1     pseudogene     1    
2     unknown        0
3     region         1
4     region         0
5     phenotype_only 1
6     region         1
...

So ideally, I would expect to get the 3rd and 6th row as a result of my query.

Mixalis
  • 532
  • 5
  • 17

3 Answers3

3

If you want to use base R, the correct syntax is the following:

df[grepl("region|druggable",df$locus_type) & df$drug_binary==1,]

Which gives the following ouput:

  id locus_type drug_binary
3  3     region           1
6  6     region           1

Since you want to combine logic vectors you need to use grepl that has a logic output. Also I assumed you wanted to check for locus type equal to region or druggable, the correct logic for the regex in grepl is the one I used above.

thepule
  • 1,721
  • 1
  • 12
  • 22
1

I like dplyr for its of readability

library(dplyr)

subdf <- filter(df, locus_type=="region", drug_binary==1)
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Nate
  • 10,361
  • 3
  • 33
  • 40
0

sometimes it can be helpful to use the sqldf library.

?sqldf

SQL select on data frames

Description

SQL select on data frames

this is how you could get the result you need:

# load the sqldf library
# if you get error "Error in library(sqldf) : there is no package called sqldf"
# you can install it simply by typing
# install.packages('sqldf')   <-- please notice the quotes!
library(sqldf)

# load your input dataframe
input.dataframe <- read.csv('/tmp/data.csv', stringsAsFactors = F)

# of course it's a data.frame
class(input.dataframe)

# express your query in SQL terms
sql_statement <- "select * from mydf where locus_type='region' and drug_binary=1"

# create a new data.frame as output of a select statement
# please notice how the "mydf" data.frame automagically becomes a valid sqlite table
output.dataframe <- sqldf(sql_statement)

# the output of a sqldf 'select' statement is a data.frame, too
class(output.dataframe)

# print your output df
output.dataframe

id locus_type drug_binary
 3     region           1
 6     region           1