I apologize for the wording of the question and the errors. Newbie in OS and in R.
Problem: Find efficient way to fill column with numbers that uniquely identify observations with same value in another column. Result would look like this:
patient_number id
1 46 1
2 47 2
3 15 3
4 42 4
5 33 5
6 26 6
7 37 7
8 7 8
9 33 5
10 36 9
Sample data frame
set.seed(42)
df <- data.frame(
patient_number = sample(seq(1, 50, 1), 100, replace = TRUE)
)
What I was able to come up with
df$id <- NA ## create id and fill with NA make if statement easier
n_unique <- length(unique(df$patient_number)) ## how many unique obs
for (i in 1:nrow(df)) {
index_identical <- which(df$patient_number == df$patient_number[i])
## get index of obs with same patient_number
if (any(is.na(df$id[index_identical]))) {
## if any of the ids of obs with same patient number not filled in,
df$id[index_identical] <- setdiff(seq(1, n_unique, 1), df$id)[1]
## get a integer between 1 and the number of unique obs that is not used
}
else {
df$id <- df$id
}
}
It does the job, but with thousands of rows, it takes time.
Thanks for bearing with me.