8

I'm currently exploring Deequ library and I'm trying to understand whether it's possible to check for the uniqueness of a combination of column.

This code

.hasUniqueness(Seq("col1", "col2"), Check.IsOne))

seems to calculate uniqueness for each column separately (correct if I'm wrong)

Thanks

Dawid
  • 652
  • 1
  • 11
  • 24

1 Answers1

8

I am one of the authors of Deequ. Your code snippet should calculate the uniqueness of the combined columns. If you feel that something is wrong with the result, then I would encourage you to open an issue at https://github.com/awslabs/deequ/issues and provide some sample code so that we can reproduce the error.

Best, Sebastian

ssc
  • 301
  • 1
  • 2
  • Thanks! Removed in favor of your answer. – Fabio Manzano Oct 07 '19 at 19:46
  • 1
    Sebastian, many thanks for your reply. Perhaps there is something I don't understand here but if .hasUniqueness(Seq("col1"), Check.IsOne)) returns Success I would expect .hasUniqueness(Seq("col1", "col2"), Check.IsOne)) to be a Success as well . A combination of a unique column and a non-unique one should still be unique, right? – Dawid Oct 14 '19 at 09:37
  • 1
    Yes, that should be the case. – ssc Oct 15 '19 at 10:15
  • 1
    ok, appears not to follow this rule if one of the columns contains nulls – Dawid Mar 06 '20 at 10:16
  • 1
    We fixed this problem in Deequ recently, will cut a release with the fix soon. – ssc Mar 07 '20 at 11:44