3

Is there a way to filter out columns based on some condition using dplyr? This is a bit confusing because it is the opposite of normal filtering.

I can't find anything directly applicable on SO. Found this and this but they don't do quite the same thing.

Basically, instead of filtering out rows based on a column's value, I want to remove columns based on a row's value.

Here's an example using the following data frame:

df <- data.frame(aa = c("1", "a", "10.2", "12.1", "8.7"), 
                 ab = c("1", "b", "5.3", "8.1", "9.2"), 
                 ac = c("0", "a", "1.8", "21.5", "16.0"), 
                 ad = c("0", "b", "11.1", "15.9", "23.6"))

I know it's a strange data set and that the columns have data of varying types. This is actually the reason for the question. I'm trying to clean this up.

Here is a base solution, using traditional subsetting, to this, which returns columns "ab" and "ad":

df[, df[2,] == "b"]

Is there a way to accomplish this using dplyr? I tried using filter, select and subset to no avail, but I might be using them incorrectly in this case.

hmhensen
  • 2,974
  • 3
  • 22
  • 43
  • @GeorgeWood Checked the link. Does not address using a condition. I can select manually. I'm trying to have R do it for me. – hmhensen Jun 22 '18 at 14:51
  • what's wrong with the baseR solution?? – tjebo Jun 22 '18 at 14:55
  • I would suggest to try to find a better title for the question, in order to make it better visible for search engines. Such as: Select columns containing string within rows. "Filter" is kind of reserved for filtering rows. Ideally, try to avoid using 'df' and similar, as those are baseR functions and you're messing up with other people's environments. – tjebo Jun 22 '18 at 15:08
  • 1
    @Tjebo I changed the title to reflect your concerns. However, I kept a reference to "filter." Although this question may not be about filtering, I think that the concept is similar and, therefore, people may search for this solution using that term. – hmhensen Jun 22 '18 at 17:21

2 Answers2

7

You can use select_if which is a scoped variant of select:

df %>%
  select_if(function(x) any(x == "b"))

#    ab   ad
# 1   1    0
# 2   b    b
# 3 5.3 11.1
# 4 8.1 15.9
# 5 9.2 23.6

Here, I supplied a function to find any column containing "b".

Edit based on your comment below:

df %>%
  mutate(row_n = 1:n()) %>%
  select_if(function(x) any(x == "b" & .$row_n == 2))

Here, we mutate a variable n_row indicating the row number, then add the row number as a condition in the call to select_if.

George Wood
  • 1,914
  • 17
  • 18
  • This is on the right track, but I need it to look at row two because there are other rows in my data set that will contain the same value. – hmhensen Jun 22 '18 at 15:38
  • Edited the answer to add a further condition for the row number. – George Wood Jun 22 '18 at 15:52
  • Nice solution. Works nicely. I made a slight adjustment that, I think, makes it simpler and more straightforward. I replaced the `row_n` with `rownames` to avoid having to `mutate`. `df %>% select_if(function(x) any(x == "b" & rownames(df) == 2))`. Of course, then we have to call `df` again, so it's a matter of preference. One could also `mutate` out the new column with `mutate(row_n = NULL)` at the end. – hmhensen Jun 22 '18 at 17:05
3

You can use the following method:

 df <- df %>%
    select(ab, ad)

The good part about using this is that you can also do not select using the following idea:

 df <- df %>%
    select(-ab) 

This will select all the columns but not "ab". Hope this is what you're looking for.

gabzo
  • 198
  • 1
  • 13
  • Thanks for trying, but this doesn't answer the question in that there is no condition applied. – hmhensen Jun 22 '18 at 14:49
  • You can use some more stuff inside the select. For example: `select(contains("a"))` For more info have a look here: https://www.r-bloggers.com/the-complete-catalog-of-argument-variations-of-select-in-dplyr/ – gabzo Jun 22 '18 at 14:54