0

Im trying to parse a yaml with a data-validation framework Deequ, and adding checks for data looks like this

result.addCheck(check.hasSomething(arguments))

In order to make it more accessible i created this function

for exp in checks[table]:
    params = checks[table][exp]
    validate = getattr(check, exp)
    result = result.addCheck(
        validate(*params)
    )

And my yaml looks like this

checks = """
table:
  hasSize:
   - "lambda x: x < 55000"
  isUnique: 
   - "customer"
"""

So ideally i would like to loop through checks in the parsed yaml like i did with for exp in checks[table]:

And call this function result.addCheck(validate(*params))

The problem is that even though i put an asterisk in validate(*params) It still seems to look at it as a string

Can't execute the assertion: An exception was raised by the Python Proxy. Return Message: Traceback (most recent call last):   File "/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 2442, in _call_proxy     return_value = getattr(self.pool[obj_id], method)(*params)   File "/databricks/python/lib/python3.8/site-packages/pydeequ/scala_utils.py", line 37, in apply     return self.lambda_function(arg) TypeError: 'str' object is not callable !

Do you have any idea how to pass this "lambda x: x < 55000" into a function so it can be callable?

Thanks

Jakobkubek
  • 77
  • 1
  • 6
  • 2
    The [asterisk does not turn a string into code](https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters), the [`eval` keyword](https://stackoverflow.com/questions/9383740/what-does-pythons-eval-do) does. However, this will open your script to arbitrary code execution etc. so use with care. – Friedrich Mar 02 '23 at 10:43
  • As Friedrich says, don't run a string as code. Instead make your yaml have maxSize 55000, pull this from your yaml and check it against your table .hasSize(_ < checks[table][maxSize]) – Thering Mar 02 '23 at 12:18

0 Answers0