background: - dataframe with 60.000 lines - 5 columns: pt/bi/sx/ex/re - pt = subject; bi = birth; sx = sex; ex = exam (14 types); re = result of exam
> head(fim)
pct nasc sex exam res
1 ACF 11/09/1951 F ldl 81
2 ACF 11/09/1951 F colt 172
3 ACF 11/09/1951 F tg 152
4 ACF 11/09/1951 F ferr 28,1
5 ACF 11/09/1951 F fe 41
6 ACF 11/09/1951 F plq 256000
...
So.. as you can see, each subject has at least 14 rows corresponding to 14 exams with their results.
My problem is that I want to subset all patients and their set of exams based on a exam result. An example: I would like to have all subjects and their set of exams that has the exam1 == 15 or "positive".
Despite having tried several ways, the only solution I think is possible is through casting to wide format, selecting and reshaping again. BUT when I use the cast function, all values are changed:
library(reshape)
df_wide <- cast(df, pt~ex)
Long to wide works fine, but the original values are lost to new ones. Can anyone help me with that or has another idea on how I can subset it in another way?
> head(dfw)
pct hcv ldl colt cr ferr fe...
1 AFC R 73 157 9,56 1687,0 80
2 AAPS R 78 130 0,91 879,0 104
3 ASS R 96 151 0,76 666,2 138
4 ARS R 67 115 0,73 674,0 133
5 ARDS R 180 261 0,71 105,0 110
...
Solution:
keep <- dfw[dfw$exam == "hcv" & fim$res == "R", "pct"]
dfw = dfw[!duplicated(dfw), ]
subset_dfw <- filter(dfw, pct %in% keep)
subset_dfw %>% group_by(pct) %>% filter (!duplicated(exam))