0

I am creating unit tests for a Scala application using Scala Test. I have the actual and expected results as Dataset . When I verified manually both the data and schema matches between actual and expected datasets.

Actual Dataset= actual_ds
Expected Dataset = expected_ds

when I execute below command ,it returns False.

assert(actual_ds.equals(expected_ds))

Could anyone suggest what could be the reason. And is there any other inbuilt function to compare the datasets in scala?

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
kiruba
  • 129
  • 5

2 Answers2

2

That .equals() is from Java Object .equals so it's correct that the assert fails.

I would start testing two datasets with:

  1. assert actual_ds.schema == expected_ds.schema
  2. assert actual_ds.count() == expected_ds.count()

And then checking this question: DataFrame equality in Apache Spark

Dmytro Mitin
  • 48,194
  • 3
  • 28
  • 66
Aspa
  • 41
  • 5
1

Use one of libraries designed for spark tests, spark-fast-tests , spark-testing-base, spark-test

They are quite ease to use and with their help its easy to compare two datasets with formatted message on output

You may start with spark-fast-tests (you can find usage in readme file) and check others if it does not suite your needs (fro example if you need other output formatting)

M_S
  • 2,863
  • 2
  • 2
  • 17