So, this is a rookie question.
I've got a data frame which includes all responses of an online survey, totaling in 89 columns.
As online surveys are sometimes filled out by people who do not really care and who simply put in response values that are easy to fill in, I would like to filter out implausible rows where someone simple hit an extreme value over and over again.
I would like to filter out rows where these columns ALL have the value "9", or ALL have the value "1":
- 'sociald_ties_strong'
- 'sociald_ties_weak'
- 'sociald_ties_secondorder'
- 'sociald_identity_lifestyle'
- 'sociald_identity_politics'
- 'sociald_vertical_socialcapital'
- 'sociald_vertical_networkcapital'
So this is my code (including the tidyverse and dplyr packages):
data-cleaned <- data_raw %>%
fdaten_bereinigt <- rohdaten_basis %>%
filter(sociald_ties_strong == 9 & sociald_ties_weak == 9 & sociald_ties_secondorder == 9 & sociald_identity_lifestyle == 9 & sociald_identity_politics == 9 & sociald_vertical_socialcapital == 9 & sociald_vertical_networkcapital == 9) %>%
filter(sociald_ties_strong == 1 & sociald_ties_weak == 1 & sociald_ties_secondorder == 1 & sociald_identity_lifestyle == 1 & sociald_identity_politics == 1 & sociald_vertical_socialcapital == 1 & sociald_vertical_networkcapital == 1)
But, I seem to be missing something in my logic and / or syntax, as this filters out way to many rows.
My data cleaning would include more conditional strings like the above to exclude faulty or automated rows, but first I want to learn how to get it right.
Possibly two (or more) filters piped together is not a good idea the way I did it? Any suggestions welcome!
I expected to filter out the few rows that meet all the conditions, possibly 0.5% of the total observations.