How do I pick the rows in data frame with at least one variable with non-missing value?

Question

In a dataframe, I only want to keep rows that have at least one variables starting with DSDECOD is NOT empty. How can I do that?

it seems that following code works.

ds_sub <- subset(ds_supp, (DSDECOD1 !="" | DSDECOD2 !="" |
    DSDECOD3 !="" | DSDECOD4 !=""))

But is there simple way so that I don't have to write out all of the variables starting with DSDECOD?

This may be helpful: https://stackoverflow.com/questions/22850026/filter-rows-which-contain-a-certain-string — TheSciGuy, Mar 31 '20 at 15:53

score 1 · Accepted Answer · answered Mar 31 '20 at 16:08

Maybe using rowSums and grepl:

ds_supp[rowSums(ds_supp[, grepl("^DSDECOD", names(ds_supp))]!="")>0,]

  ID DSDECOD1 DSDECOD2 DSDECOD3 DSDECOD4
1  1                          B         
2  2        A                 A        A
3  3        B                          B
5  5        C                 C        C
6  6                          D        D

Data:

  ID DSDECOD1 DSDECOD2 DSDECOD3 DSDECOD4
1  1                          B         
2  2        A                 A        A
3  3        B                          B
4  4                                     # <- empty row
5  5        C                 C        C
6  6                          D        D

this works for me with adding 'na.rm=TRUE' in the rowSums function. — sunflower, Mar 31 '20 at 19:05

bsuthersan · Answer 2 · 2020-03-31T16:59:08.623

0

You could try using select and the remove_empty function from the janitor package?

ds_sub %>%
select(contains("DSDECOD")) %>%
janitor::remove_empty(.)

edited Mar 31 '20 at 16:59

answered Mar 31 '20 at 16:48

bsuthersan

118
5

Chris Ruehlemann · Answer 3 · 2020-03-31T17:13:53.970

This regex solution works:

df[-which(grepl("\\d$", apply(df, 1, paste0, collapse = ""))),]

   id DSDECOD1 DSDECOD2 DSDECOD3
1   1                 A         
2   2        B                  
3   3                          A
4   4                 B         
8   8                          A
9   9                          B
10 10                          A

This solution works by paste0ing the rows together and then subtracting from the dataframe those strings which end ($) on a digit (\\d), which happens only when the DSDECOD rows are empty:

Reproducible data:

df <- data.frame(
  id = 1:10,
  DSDECOD1 = c("", "B", rep("",8)),
  DSDECOD2 = c("A","","","B","","","","","",""),
  DSDECOD3 = c("", "", "A", "","","","", "A", "B", "A"))

df
   id DSDECOD1 DSDECOD2 DSDECOD3
1   1                 A         
2   2        B                  
3   3                          A
4   4                 B         
5   5                             # empty 
6   6                             # empty 
7   7                             # empty
8   8                          A
9   9                          B
10 10                          A

How do I pick the rows in data frame with at least one variable with non-missing value?

3 Answers3