Classifying a database with filter/mutate and dplyr/tidyverse logic

Question

I am trying to classify a dataframe with cascading criteria using the tidyverse logic (I am trying to learn it). I can do it with base R but can't do it with tidyverse- I found some examples using an hybrid approach tidyverse+base r (using subset) but can't find/understand how to do it using only the dplyr/tidyverse grammar (filter, mutate).

The problem is that, after subsetting for the first criterium (using filter) , the dataframe contains only the filtered rows, and I am not able to subset and classify applying the remaining criteria. I can probably use a temporary df and rbind() but I think there could be a more elegant way to do it using only the tidyverse grammar. In short I would like to update ONLY the rows matching my criteria, with all the other rows left untouched in the originary DF. I should do it using the dplyr grammar. Is that possible?

# with base R
    mydata$mytype = "NA"
    mydata$mytype[which(mydata$field1 > 300)] = "type1"
    mydata$mytype[which(mydata$field1 <= 300 & mydata$field1 > 200)] = "type2"

# with dplyr/tidyverse?
    library(tidyverse)
    mydata<-mydata%>% mutate(mytype = "NA")
    mydata<-mydata%>%filter(field1>300) %>% mutate(mytype="type1") 
    mydata<-mydata%>%filter(field1 >200, field1<=300) %>% mutate(mytype="type2")  #0 rows now

Are you aware of the `case_when` function in `dplyr`? See `?dplyr::case_when` — markdly, Jun 28 '18 at 23:10

score 0 · Answer 1 · answered Jun 28 '18 at 21:30

0

One option is to use cut as:

df$mytype  <- cut(df$field1, breaks = c(-Inf,201,301,+Inf), 
                        labels = c("NA", "Type2", "Type1"))

Since, OP hasn't provided any data hence, trying above solution over a vector as:

cut(c(100, 190, 250, 260, 310), breaks = c(-Inf,201,301,+Inf), 
                labels = c("NA", "Type2", "Type1"))
#[1] NA    NA    Type2 Type2 Type1
#Levels: NA Type2 Type1

answered Jun 28 '18 at 21:30

MKR

19,739
4
23
33

Thanks, that's another option. Since I am learning dplyr though, I would like to understand how to update only the rows matching my criteria -via the dplyr grammar-, with all the other rows left untouched in the original df. Dont'know if that's possible – Marcello Del Bono Jun 28 '18 at 22:59

score 0 · Answer 2 · answered Jun 29 '18 at 02:50

0

Using dplyr, you can:

1 - Set "breaks" to "field1" and their "labels".

breaks <- c(-Inf, 200, 300)

labels <- c("type1", "type2)

2- Do:

df <- df %>% mutate(category=cut(field1, breaks= breaks, labels= labels))

answered Jun 29 '18 at 02:50

Tarssio Barreto

42
3

Classifying a database with filter/mutate and dplyr/tidyverse logic

2 Answers2