I have a dataframe of 286 columns and 157355 rows. I wish to subset rows that contain one or more of several defined factor variables such as F32, F341 etc. Once this has been completed, I wish to identify which other factor variables are most common in the subset rows.
I have tried to filter for values of interest but an error messages appears saying the data must be numeric, logical or complex, for example;
d<- a %>%
filter_at(vars(f.41202.0.0:f.41202.0.65), all_vars('F32'))
I also tried this, but the resulting dataframe had no values present;
f <- a %>%
rowwise() %>%
filter(any(c(1:280) %in% c('F32', 'F320', 'F321', 'F322', 'F323',
'F328', 'F329', 'F330', 'F331', 'F332',
'F333', 'F334', 'F338', 'F339')))
the same occurred when I tried to place all relevant variables into an ICD object;
f <- b %>%
rowwise() %>%
filter(any(c(1:286) %in% ICD))
I would greatly appreciate any suggestions, thanks
my data looks like this (sorry I can't find a way to format it better on this page);
Row.name Var1 Var2 Var3 Var4
1 F3 NA NA M87
2 NA NA M87 NA
3 NA F3 NA K17
4 NA NA F3 M87
After sub-setting rows based on F3 it should look like this;
Row.name Var1 Var2 Var3 Var4
1 F3 NA NA M87
3 NA F3 NA K17
4 NA NA F3 M87
so the same variable columns are retained, but rows without F3 are removed
then I would hope to list the other variables (other than F3) based on how common they are within that subset, in this case that would be
most common: M87
2nd most common: K17
If it helps, I am trying to identify individuals with a particular disease, then I will try to find out which other diseases those individuals most commonly have
thanks for the help