I have a dataframe with diagnosis in the x-axis (from diagnosis 1 to 30) and ID-numbers in the y-axis. The observations is the different diagnosis the patient have gotten by the doctor.
I had a larger dataframe which i made Traminer sequence analysis, and got the dataframe described above. it looks like this:
- d1 (diagnose 1) etc.
the diagnosis i have stated below is just an example
d1 d2 d3 d4 d5 d6 d7 etc. 1 cancer 2 cancer 3 nothing 4 nothing 5 cancer 6 headache
So i want to make a new dataframe where i group all patients who who have "cancer" in the first diagnose, and a group with all patient who has "nothing" as first diagnose and so one. This is because the dataframe is to large and i want to minimize that way.
Data example:
set.seed(1)
Data <- data.frame( d1 = sample(c("cancer", "cancer", "cancer",
"cancer","nothing", "cancer","cancer", "cancer" )), d2 = sample(c("cancer",
"headache", "cancer", "cancer", "nothing", "nothing", "nothing", "nothing")),
d3 = sample(c("cancer", "headache", "cancer", "cancer", "headache", "nothing",
"nothing", "headache")) )
Is that possible?
EXPECTED OUTCOME:
I expect an outcome where i can see the number of the persons who has had cancer as first diagnosis, and "nothing" as first diagnosis and so on. so maybe something like this:
D1 D2 D3 D4 D5 ECT. CANCER 5 4 HEADACHE 4 3 NOTHING 1 3