2

I want to anonymise a dataset by replacing the original dates and times columns with new, randomized dates (from 01.01.2012 till 31.12.2015) and new, randomized times.

  • Format of the date column: d%.m%.Y%

  • Format of the time colum: h:m

The dataframe consists of 37.094 rows.

Any ideas?

Lutz
  • 223
  • 5
  • 15

2 Answers2

3

We can use seq.POSIXt with sampling for this.

# for reproducbility we set a seed.
set.seed(4242)

Sampling size set to specified size of 37094. by in seq.POSIXt is now 60 seconds times 15 minutes. Adjust the minutes to whatever you like the interval to be.

samplesdates <- sample(seq.POSIXt(as.POSIXct("2012-01-01 00:00"), as.POSIXct("2015-12-31 23:59"), by = 60*15), size = 37094, replace = TRUE)

newdates <- as.character(samplesdates, "%d.%m.%Y")
head(newdates)
[1] "11.12.2015" "23.05.2013" "01.12.2012" "04.09.2014" "23.10.2014" "27.09.2015"

newtimes <- as.character(samplesdates, "%H:%M")
head(newtimes)
[1] "17:00" "01:15" "21:15" "00:30" "19:30" "08:30"
phiver
  • 23,048
  • 14
  • 44
  • 56
0

Here's a way that converts the dates to unixtime format, samples at random from the range and then converts back to a date. A bit of formatting is needed to get the required output.

library(lubridate)
start = as.integer(dmy_hms('01-01-2012 00:00:00'))
end = as.integer(dmy_hms('31-12-2015 00:00:00'))
randomdates = as.POSIXct(runif(37094, start, end), origin = '1970-01-01')
randomdatepart = format(randomdates, '%d.%m.%y')
randomtimepart = format(randomdates, '%H:%M')
Andrew Chisholm
  • 6,362
  • 2
  • 22
  • 41