1

I am trying to do data scrubbing, where I am trying to scrub date of birth field, but I want it to be consistent in a way, that the same random number or date of birth be generated for the same input date. Kindly help me regarding this.

I have tried this random generation code, but it generates different code, even if I provide the same input. I want the random output to remain consistent.

import java.util.GregorianCalendar;

public class RandomDateOfBirth {

    public static void main(String[] args) {
        GregorianCalendar gc = new GregorianCalendar();
        int year = randBetween(1900, 2010);
        gc.set(gc.YEAR, year);
        int dayOfYear = randBetween(1, gc.getActualMaximum(gc.DAY_OF_YEAR));
        gc.set(gc.DAY_OF_YEAR, dayOfYear);
        System.out.println(gc.get(gc.YEAR) + "-" + (gc.get(gc.MONTH) + 1) + "-" + gc.get(gc.DAY_OF_MONTH));
    }

    public static int randBetween(int start, int end) {
        return start + (int) Math.round(Math.random() * (end - start));
    }

}
ETO
  • 6,970
  • 1
  • 20
  • 37
Shaun
  • 11
  • 2
  • 1
    If you are replacing a sensitive piece of data with a value that can be repeatedly determined, you have not really scrubbed your data. Also, your title is a contraction. A random value cannot be predictably repeated, by definition. – Basil Bourque Oct 08 '19 at 04:14
  • Your tag defines data-scrubbing as *correcting (or removing) corrupt or inaccurate records*. I fail to see how a random date of birth can be considered correct? Please explain more precisely what you are trying to obtain and why. – Ole V.V. Oct 08 '19 at 13:42

3 Answers3

1

Beware: If you are replacing a sensitive piece of data with a value that can be repeatedly determined, you have not really scrubbed your data. If your purpose is to protect sensitive data, such as HIPAA, I suggest your consult someone who is in charge. They should be trained on how to appropriately scrub data.

Another point to clarify: Your title is a contraction. A random value cannot be predictably repeated, by definition.

java.time

Your code example is using terrible date-time classes that were supplanted years ago by the modern java.time classes defined in JSR 310. For a date-only value, use LocalDate class.

Just assign an arbitrary number of days

If you want an arbitrary yet repeatable adjustment, just add or subtract a certain number of days. You could arbitrarily assign a negative number (subtraction) for a date whose day number is odd, and assign a positive number (addition) for a date whose day number is even.

To determine even or odd number, see this Question.

int daysToAddToOddDayNumber = -2_555 ;
int daysToAddToEvenDayNumber = 2_101 ; 

LocalDate localDate = LocalDate.of( 1970 , Month.JANUARY , 1 );
boolean isEven = ( ( localDate.getDayOfMonth() & 1) == 0 ) ;
LocalDate adjusted = isEven ? localDate.plusDays( daysToAddToEvenDayNumber ) : localDate.plusDays( daysToAddToOddDayNumber ) ;

Dump to console.

System.out.println( "localDate.toString(): " + localDate ) ;
System.out.println( "adjusted.toString(): " + adjusted ) ;

See this code run live at IdeOne.com.

localDate.toString(): 1970-01-01

adjusted.toString(): 1963-01-03

Obscure the number of days to be added

You could get fancy a bit by taking a hash of the value of the date, then use that hash result to determine a number of days to be added. Again, as I said before, this may not qualify as sufficient scrubbing depending on the needs (and laws!) of your project.

LocalDate localDate = LocalDate.of( 1970 , Month.JANUARY , 1 );
String input = localDate.toString();

MessageDigest md = null;
try
{
    md = MessageDigest.getInstance( "MD5" );
    md.update( input.getBytes() );
    byte[] digest = md.digest();
    int days = new BigInteger( 1 , digest ).mod( new BigInteger( "10000" ) ).intValue();
    LocalDate adjusted = localDate.minusDays( days );

    System.out.println( "localDate = " + localDate );
    System.out.println( "input = " + input );
    System.out.println( "days = " + days );
    System.out.println( "adjusted = " + adjusted );
} catch ( NoSuchAlgorithmException e )
{
    e.printStackTrace();
}

See this code run live at IdeOne.com.

localDate = 1970-01-01

input = 1970-01-01

days = 8491

adjusted2 = 1946-10-03

Community
  • 1
  • 1
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
0

The Random class can accept a seed, this should do the trick:

public static int randBetween (int start, int end){
    int seed = end +(start*10000);
    return start + new Random(seed).nextInt((end-start));
}

The tought is that: using end +(start*10000) should give a unique, but reliable seed.

Jesper Hustad
  • 172
  • 2
  • 5
  • 1
    doesnt it return a new randomly generated date each time? i want it to return the same random date when i feed in the same input date – Shaun Oct 07 '19 at 21:22
  • As long as the `seed` is the same then the Random class will generate the same number, so by combining the two dates `end +(start*10000)` we will get a unique number that can be used as the seed :) – Jesper Hustad Oct 07 '19 at 21:27
  • yes, this can work for generating the same random number, but will it work in case of dates? thanks for your help though. really appreciate – Shaun Oct 07 '19 at 21:57
  • Yes it will work for dates to, but then you need to think of it in terms of days not years – Jesper Hustad Oct 07 '19 at 22:59
  • So you could get the number from: `year*365+month*(365/12)+days` then plus with another date multiplied with like a big number, something like 10^7 – Jesper Hustad Oct 07 '19 at 23:05
0

Assuming you have a method generateRandomDate() and you want to scrub a list of dates, the following should do the trick:

final Map<LocalDate, LocalDate> map = new HashMap<>();

List<LocalDate> initialDates = ...;
List<LocalDate> scrubbedDates =
    initialCalendars.stream()
                    .map(date -> map.computeIfAbsent(date, __ -> generateRandomDate()))
                    .collect(toList());

The same dates will be scrubbed with same randomly generated ones.


The generateRandomDate method can be implemented as follows:

public static LocalDate generateRandomDate() {
    Random random = new Random();
    int minDay = (int) LocalDate.of(1900, 1, 1).toEpochDay();
    int maxDay = (int) LocalDate.of(2015, 1, 1).toEpochDay();
    long randomDay = minDay + random.nextInt(maxDay - minDay);

    return LocalDate.ofEpochDay(randomDay);
}

The date generation code snipped is borrowed from here.

ETO
  • 6,970
  • 1
  • 20
  • 37