1

I've read other articles, such as:

Selecting rows where a column has a string like 'hsa..' (partial string match)

How do I select variables in an R dataframe whose names contain a particular string?

Subset data to contain only columns whose names match a condition

but most of them are simple fix:

  1. they only have one string to match
  2. they only have one partial string to match

so im here to ask for help.

lets say we have a sample data table like this:

sample = data.table('Feb FY2016', 50)
sample = rbind(sample, list('Mar FY2017', 30))
sample = rbind(sample, list('Feb FY2017', 40))
sample = rbind(sample, list('Mar FY2016', 10))
colnames(sample) = c('month', 'unit')

how can i subset the data so that my data contains only the rows who's "month" column satisfy following requirements:

  1. has year of 2016
  2. start with either 'Mar' or 'Feb'

Thanks!

Community
  • 1
  • 1
alwaysaskingquestions
  • 1,595
  • 5
  • 22
  • 49

1 Answers1

3

Since grep returns indices of items it matches, it will return the rows that match the pattern, and can be used for subsetting.

sample[grep('^(Feb|Mar).*2016$', sample$month),]

#         month unit
# 1: Feb FY2016   50
# 2: Mar FY2016   10

The regex looks for

  • the start of the line ^;
  • followed by Feb or Mar with (Feb|Mar);
  • any character . repeated 0 to many times *;
  • 2016 exactly;
  • followed by the end of the string $.
alistaire
  • 42,459
  • 4
  • 77
  • 117
  • Thank you so much! it works! but one question, why cannot i use grepl()? i tried grepl() with same input, the outcome contains only the "Feb" ones. – alwaysaskingquestions Mar 16 '16 at 06:58
  • `grepl` works exactly the same for me. The only difference is that `grep` returns a vector of the indices of matches (unless `value = TRUE`), and `grepl` returns `TRUE` for matches and `FALSE` for everything else; either works for subsetting. – alistaire Mar 16 '16 at 07:03
  • hmm for some reason, when i use grepl i'll only retain the feb values... but anyways. thanks very much for the help! :) – alwaysaskingquestions Mar 16 '16 at 07:10