2

In a dataframe, I only want to keep rows that have at least one variables starting with DSDECOD is NOT empty. How can I do that?

it seems that following code works.

ds_sub <- subset(ds_supp, (DSDECOD1 !="" | DSDECOD2 !="" |
    DSDECOD3 !="" | DSDECOD4 !=""))

But is there simple way so that I don't have to write out all of the variables starting with DSDECOD?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
sunflower
  • 53
  • 5
  • This may be helpful: https://stackoverflow.com/questions/22850026/filter-rows-which-contain-a-certain-string – TheSciGuy Mar 31 '20 at 15:53

3 Answers3

1

Maybe using rowSums and grepl:

ds_supp[rowSums(ds_supp[, grepl("^DSDECOD", names(ds_supp))]!="")>0,]

  ID DSDECOD1 DSDECOD2 DSDECOD3 DSDECOD4
1  1                          B         
2  2        A                 A        A
3  3        B                          B
5  5        C                 C        C
6  6                          D        D

Data:

  ID DSDECOD1 DSDECOD2 DSDECOD3 DSDECOD4
1  1                          B         
2  2        A                 A        A
3  3        B                          B
4  4                                     # <- empty row
5  5        C                 C        C
6  6                          D        D
Edward
  • 10,360
  • 2
  • 11
  • 26
0

You could try using select and the remove_empty function from the janitor package?

ds_sub %>%
select(contains("DSDECOD")) %>%
janitor::remove_empty(.)
bsuthersan
  • 118
  • 5
0

This regex solution works:

df[-which(grepl("\\d$", apply(df, 1, paste0, collapse = ""))),]

   id DSDECOD1 DSDECOD2 DSDECOD3
1   1                 A         
2   2        B                  
3   3                          A
4   4                 B         
8   8                          A
9   9                          B
10 10                          A

This solution works by paste0ing the rows together and then subtracting from the dataframe those strings which end ($) on a digit (\\d), which happens only when the DSDECOD rows are empty:

Reproducible data:

df <- data.frame(
  id = 1:10,
  DSDECOD1 = c("", "B", rep("",8)),
  DSDECOD2 = c("A","","","B","","","","","",""),
  DSDECOD3 = c("", "", "A", "","","","", "A", "B", "A"))

df
   id DSDECOD1 DSDECOD2 DSDECOD3
1   1                 A         
2   2        B                  
3   3                          A
4   4                 B         
5   5                             # empty 
6   6                             # empty 
7   7                             # empty
8   8                          A
9   9                          B
10 10                          A
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34