2

I'm trying to split my dataset (296 rows) in two, the first part will have some conditions and the other is only the rest of the dataset that does not match the conditions.

I did this and found 81 rows for the first part :

cardio = donnees %>%
  select(`Nausées/vomissements`,Vertige,Nystagmus,`Ataxie:Démarche ébrieuse`,`Motif si pas HINTS`,
         Alcool,Tabac,`atcd neuro`,Dyslipidémies,Diabète) %>%
  filter(Alcool == "Yes" |
         Tabac == "Yes"|
         `atcd neuro` == "3" |
         Dyslipidémies == "Yes"|
         Diabète == "Yes") 

Then I simply use "!" to find the rest, but unfortunately I only got 77 rows instead of the 215 expected,

donnees %>%
  select(`Nausées/vomissements`,Vertige,Nystagmus,`Ataxie:Démarche ébrieuse`,`Motif si pas HINTS`,
         Alcool,Tabac,`atcd neuro`,Dyslipidémies,Diabète) %>%
  filter(!(Alcool == "Yes" |
           Tabac == "Yes" |
           `atcd neuro` == "3" |
           Dyslipidémies == "Yes" |
           Diabète == "Yes" ))

If someone can help? Thanks a lot

2 Answers2

2

I anti_join method suggested by @MonJeanJean should work. But if incase it doesn't, here is a bit different approach, the idea is to create a index column and exclude which ever you don't require (reminiscent of mysql days).

donnees$index = 1:nrow(donnees)

cardio = donnees %>%
  select(`Nausées/vomissements`,Vertige,Nystagmus,`Ataxie:Démarche ébrieuse`,`Motif si pas HINTS`,
         Alcool,Tabac,`atcd neuro`,Dyslipidémies,Diabète, index) %>%
  filter(Alcool == "Yes" |
         Tabac == "Yes"|
         `atcd neuro` == "3" |
         Dyslipidémies == "Yes"|
         Diabète == "Yes")

cardio_required = cardio[-index, ]

This will give you the 215 rows

monte
  • 1,482
  • 1
  • 10
  • 26
1

Instead of using the brackets, you can replace the == and | symbols with != and &:

donnees %>%
  select(`Nausées/vomissements`,Vertige,Nystagmus,`Ataxie:Démarche ébrieuse`,`Motif si pas HINTS`,
         Alcool,Tabac,`atcd neuro`,Dyslipidémies,Diabète) %>%
  filter(Alcool != "Yes" &
           Tabac != "Yes" &
           `atcd neuro` != "3" &
           Dyslipidémies != "Yes" &
           Diabète != "Yes" )

Edit: you can use the anti-join function:

cardio = donnees %>%
  select(`Nausées/vomissements`,Vertige,Nystagmus,`Ataxie:Démarche ébrieuse`,`Motif si pas HINTS`,
         Alcool,Tabac,`atcd neuro`,Dyslipidémies,Diabète) %>%
  filter(Alcool == "Yes" |
         Tabac == "Yes"|
         `atcd neuro` == "3" |
         Dyslipidémies == "Yes"|
         Diabète == "Yes")

others <- dplyr::anti_join(donnees, cardio)
MonJeanJean
  • 2,876
  • 1
  • 4
  • 20