I have a dataset in R with information about individuals and diagnoses. The variables are group, age, weight, id and diagnosis. So an individual can have one row with X in diagnosis (meaning no diagnosis) or one or more rows with diagnoses. Now I want to make a new variable with the number of diagnoses each individual got so that each individual has one row in the dataset with the variables group, age, weight, id and number of diagnoses. In this new column with diagnosis I want individuals with no diagnosis to get the number 0, with one diagnosis the number 1, with two diagnoses the number 2 and etcetera. Can anyone help me?
I am using R. I tried to use group_by and count but I can not get the number 0 for individuals with no diagnosis (X in the diagnosis column) and I can not see the other variables like group, age and weight.
Here is the data:
pr <- read_csv("~/Desktop/Data.csv")
head(pr)
dput(pr)
structure(list(GROUP = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3,
3, 3, 3, 3, 4, 4, 4), AGE = c(23, 34, 61, 23, 45, 34, 34, 55,
56, 43, 56, 49, 61, 49, 74, 49, 51, 46, 75), WEIGHT = c(56, 72,
70, 56, 101, 72, 72, 62, 60, 78, 60, 55, 79, 55, 89, 55, 67,
60, 105), ID = c(4, 1, 2, 4, 3, 1, 1, 5, 7, 6, 7, 8, 9, 8, 10,
8, 11, 12, 13), DIAGNOSIS = c("J01", "J01", "X", "J01", "J01",
"J01", "J01", "J01", "J01", "J01", "J01", "J01", "X", "J01",
"J01", "J01", "X", "J01", "J01")), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -19L), spec = structure(list(
cols = list(GROUP = structure(list(), class = c("collector_double",
"collector")), AGE = structure(list(), class = c("collector_double",
"collector")), WEIGHT = structure(list(), class = c("collector_double",
"collector")), ID = structure(list(), class = c("collector_double",
"collector")), DIAGNOSIS = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
Picture of the desired output: Desired output