0

I have a simple dotnet spark app and I have tried to break it down into units for testing. A sample unit,

public DataFrame filtermyname(DataFrame df, string name) 
{
   return df.Filter(“name”==name);
}

Since unit test should not have external dependencies, my organisation is not allowing installing spark in the build servers. Is there a way to test this without installing spark by mocking session?

Selva
  • 951
  • 7
  • 23
  • I mean, just wrap the DataFrame in an interface, and mock that the interface does whatever the concrete implementation of the dataframe does. However, that doesn't validate that the concrete implementation, does what it is supposed to. If you want to validate the a DataFrame does what it is supposed to, you have to have access to the dependency, to test it. Otherwise, you aren't testing it?? I am not sure I understand your question. – Morten Bork Aug 25 '22 at 06:52
  • Thanks @MortenBork. I am expecting something like mentioned in https://github.com/dotnet/spark/issues/637. I tried to implement it but couldn’t make it work. – Selva Aug 25 '22 at 17:52

1 Answers1

0

I am not 100% sure I am fully understanding you, or the complications of you architecture.

But I would assume, that the action you have on:

df.Filter(“name”==name);

You replace with:

public interface IFilterSource {
   IFilterSource FilterByText(string filterText);
}

Then implement IFilterSource on the DataFrame class? Or make an implementation of IFilterSource that has DataFrame as a property, and then apply the filter on that property.

so that your method becomes:

public IFilterSource filtermyname(IFilterSource source, string name) 
{
   return source.FilterByText(name);
}
 

Now you can mock the IFilterSource, and use a concrete instance for a DataFrame.

Morten Bork
  • 1,413
  • 11
  • 23