0

I am working with R for a few month now and still considering myself a beginner in R. Thanks to this community, I've learned so much about R already. I can't thank you enough for that.

Now, I have a question that somehow always comes back to me at some point and is so basic in nature that I have the feeling, that I should already have solved it myself at some point.

It is related to this question: filtering data frame based on NA on multiple columns

I have a data.frame that contains are variable number of columns containing a specific string (e.g. "type") in the name.

Here, is a simplified example:

data <- data.frame(name=c("aaa","bbb","ccc","ddd"), 
                   'type_01'=c("match", NA, NA, "match"),
                   'type_02'=c("part",NA,"match","match"),
                   'type_03'=c(NA,NA,NA,"part"))

> data
name type_01 type_02 type_03
1  aaa   match   part     <NA>
2  bbb    <NA>    <NA>    <NA>
3  ccc    <NA>   match    <NA>
4  ddd   match   match    part

OK, I know that can filter the columns with...

which(is.na(data$'type_01') & is.na(data$'type_02') & is.na(data$'type_03'))
[1] 2

but since the number of type columns are variable (up to 20 sometimes) in my data and I would rather like to get them with something like ...

grep("type", names(data))
[1] 2 3 4 

... and apply the condition to all of the columns, without specifying them individually.

In the example here, I am looking for the NAs, but that might not always be the case.

Is there a simple way, to apply a condition to multiple columns sharing a common names without specifing them one by one?

oguz ismail
  • 1
  • 16
  • 47
  • 69
Nerdbert
  • 3
  • 1
  • 2

1 Answers1

1

You don't need to loop or apply anything. Continuing from your grep method,

i1 <- grep("type", names(a))
which(rowSums(is.na(a[i1])) == length(i1))
#[1] 2

NOTE I renamed your data frame to a since data is already defined as a function in R

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • I knew it couldn't be that hard. I seriously didn't think about using rowSums just to apply a single condition to it. I was so fixated on the content of these columns that I missed that I could just count the columns I am interested in and compare the result to what I would expect. Thx a lot for the help... that was a real eye opener. I ticked the mark and thanks for the note. I name stuff 'data' quite often. Maybe I should change that in the future. – Nerdbert Jan 08 '18 at 14:53
  • You are welcome. Yes, naming objects with predefined functions is basically asking for bugs – Sotos Jan 08 '18 at 14:54