I know the names
of four individuals, and an interval within which each was born (given by the birth_low
and birth_high
columns):
> df <- data.frame(id = c(1:4), name = c("john", "john", "leo", "anna"), birth_low = dmy(c("01/01/1978", "01/01/1978", "01/03/1979", "01/03/1979")), birth_high = dmy(c("31/12/1978", "31/12/1978", "30/03/1979", "01/04/1979")))
> df
id name birth_low birth_high
1 john 01/01/1978 31/12/1978
2 john 01/01/1978 31/12/1978
3 leo 01/03/1979 30/03/1979
4 anna 01/03/1979 01/04/1979
I need to write a reproducible code to assign a random date of birth DoB
to each record. Other considerations require me to use a loop for this:
> for (n in 1:nrow(df)) {
set.seed(n)
date <- runif(1,df$birth_low[n], df$birth_high[n])
date <- ceiling(date) # round up float number
date <- dmy("01/01/1970") + date
date <- format(date, "%d/%m/%Y")
df$DoB[n] <- date
}
> df$DoB
[1] "07/04/1978" "09/03/1978" "05/03/1979" "19/03/1979"
An obvious issue with the code above is that it uses n
to set the seed for every iteration. I will constantly by inputing new values, and if another person in df[1,]
had the same values for birth_low
and birth_high
, then the same "random" date would be produced ("07/04/1978").
I thought of determining the seed through the length of the name or a combination of letters, but these alternatives yield a similar problem (e.g. every "john" in the first row will have the same seed). So the problem really is how to set the seed within a loop in a way that is independent from the data, yet still reproducible.
Any ideas?