1

Using amazon deequ library I'm trying to build a function that takes 3 parameters, the check object, a string telling what constraint needs to be run and another string that provides the constraint criteria. I have a bunch of checks that I want to read from a mysql table. My intention is to iterate through all the checks that I get from the mysql table and build a check object using the function I described above and run the checks on a source dataframe Here a example of the amazon deequ https://towardsdatascience.com/automated-data-quality-testing-at-scale-using-apache-spark-93bb1e2c5cd0

So the function call looks something like this,

var _check = build_check_object_function(check_object, "hasSize", "10000")

This function should add a new hasSize check to the check_object and return that.

The part where I'm stuck is how to translate the hasSize string to the hasSize function.

    var _check = Check(CheckLevel.Error, "Data Validation Check")
    val listOfFunctions= _check.getClass.getMethods.filter(!_.getName().contains('$'))
    for (function <- listOfFunctions) {
       if( function.getName().toLowerCase().contains(row(2).asInstanceOf[String].toLowerCase())) {
         _check = _check.function(row(3))
        }else{
            println("Not a match")}
        }

Here is the error that I'm getting

<console>:38: error: value function is not a member of com.amazon.deequ.checks.Check
   if( function.getName().toLowerCase().contains(row(2).asInstanceOf[String].toLowerCase())) {_check = _check.function(row(3))                                                          
Riyan Mohammed
  • 247
  • 2
  • 6
  • 20

1 Answers1

0

You can either use runtime reflection or build a thin translation layer between your database and the deequ declarations.

I would suggest you go with translating database constraint/check strings explicitly to deequ declarations, e.g.:

if (constraint == "hasSize") {
  // as Constraint
  Constraint.sizeConstraint(_ <= 10)
  // as Check
  Check(CheckLevel.Error, "name").hasSize(_ <= 10)
}
Philipp
  • 126
  • 3
  • 1
    yeah...but this way...if there's a new rule in deequ library ....I have to come back to my code and add that in to my code – Riyan Mohammed Apr 06 '20 at 16:07
  • The other option is to store the actual `deequ` sourcecode in the table and use the Scala compiler to compile it. See this answer for an example. https://stackoverflow.com/questions/58453541/load-constraints-from-csv-file-amazon-deequ/69804027#69804027 – Joe McMahon Nov 09 '21 at 00:01