0

I need your help. I would like to transform (re-calculate) a date column by randomly changing only the month and day and keeping only the year. This task aims to anonymize the date of birth for certain people.

For example: For this original date: '1996-07-04' to be '1996-12-01'. (let's say. it could be anything else instead of day and month).

OriginalDateBirth AnonymDatebirth
1996-07-04 1996-12-01
1955-01-01 1955-04-09

Do you think you can help me with a sql statement, function that can implement it?

Thanks :)

Colleric1
  • 23
  • 4
  • 1
    Tag only the database that you use. – forpas Feb 09 '22 at 16:05
  • 3
    Using your backends built-in date\datetime functions make them all yyyy-01-01. – Cetin Basoz Feb 09 '22 at 16:07
  • I can't. All I have to do is to randomize the month and the day and keep the year! – Colleric1 Feb 09 '22 at 16:09
  • 3
    If the day/month are meaningless then why bother with random values, just make them all the same? – Stu Feb 09 '22 at 16:11
  • Why random instead of fixed, eg `-01-01`? You can't use random values for month and day, you risk generating invalid dates. With a fixed day you can use `DATEFROMPARTS(Year(BirthDate),1,1)` – Panagiotis Kanavos Feb 09 '22 at 16:13
  • Because this is the task, the rule. It's a business requirement, probably for diversity! :) Can you help me? – Colleric1 Feb 09 '22 at 16:13
  • Why is this the rule? There's no "because they said so" in software engineering. Most of the time, when that phrase is used it means the actual business requirements were confused with the implementation. You were asked to anonymize dates. `1966-01-01` is as anonymous as any other date in that year. – Panagiotis Kanavos Feb 09 '22 at 16:15
  • C'mon bro.. This is the requirement. If you want, you help me, if not, it's okay. – Colleric1 Feb 09 '22 at 16:17
  • 1
    If you really, really need a random date, `DATEADD(d, RAND()*364, DATEFROMPARTS(Year(BirthDate),1,1))` – Panagiotis Kanavos Feb 09 '22 at 16:17
  • @Casper you *assume* that's the requirement. I can't even count the times when I found out that something weird wasn't the requirement at all, just what someone assumed the implementation would look like. In almost all cases, the *real* requirement was a *LOT* easier to implement. As in 100 times easier – Panagiotis Kanavos Feb 09 '22 at 16:19
  • @Casper especially in this case you don't realize what the problem is with the assumed solution. You can't generate month and year randomly without risking invalid dates. You'd even have to identify leap years. That's why I add `RAND()*364` to the year's start – Panagiotis Kanavos Feb 09 '22 at 16:22
  • You could use @PanagiotisKanavos ' method but replace the `RAND()` with the solution [here](https://stackoverflow.com/q/1045138/2029983). Of course for leap years you will *never* get 31 December. – Thom A Feb 09 '22 at 16:23
  • 1
    @Larnu I was coming to that, to explain why blindly following "requirements" without understanding them is a bad idea. Calculating a new birth date for every row can be very expensive for large tables. I bet the business users would be fine with `1966-01-01` – Panagiotis Kanavos Feb 09 '22 at 16:27
  • I don't disagree at all with that, @PanagiotisKanavos . I suspect this requirement is probably being driven by a business manager, not a business user; such people (in my experience) often don't know what is best for the users or system and their arbitrary requirements can often be taken with large pinches of salt. – Thom A Feb 09 '22 at 16:33

1 Answers1

2

Here's one method you can try. This doesn't try to cater for varying days in a month but perhaps it might be "good enough" for your use-case.

(If you really wanted to cater for days 29-31 you could apply the months and then use a case expression on the result to control the mod value for the day)

with sampledata as (
  select Convert(date, '19960704') DOB union all
  select Convert(date, '19550101') DOB
)
select dob, 
  DateFromParts( Year(dob), (Abs(Checksum(NewId())) % 12)+1, (Abs(Checksum(NewId())) % 28)+1) AnonDob, 
from sampledata

DB Fiddle

Stu
  • 30,392
  • 6
  • 14
  • 33