2

I would like a Dataflow template with a default value for one of the PipelineOptions parameters.

Inspired by examples online I use a ValueProvider placeholder for deferred parameter setting in my PipelineOptions "sub"-interface:

  @Default.String("MyDefaultValue")
  ValueProvider<String> getMyValue();
  void setMyValue(ValueProvider<String> value);

If I specify the parameter at runtime, the template works for launching a real GCP Dataflow job. However if I try to test not including the parameter before doing this for real:

@Rule
public TestPipeline pipeline = TestPipeline.create();
...
  
@Test
public void test() {
  PipelineOptions options = PipelineOptionsFactory.fromArgs(new String[] {...}).withValidation();
  ...
  pipeline.run(options);
}

Then when my TestPipeline executes a DoFn processElement method where the parameter is needed I get

IllegalStateException: Value only available at runtime, but accessed from a non-runtime context: 
RuntimeValueProvider{propertyName=myValue, default=MyDefaultValue}
...

More specifically it fails here in org.apache.beam.sdk.options.ValueProvider:

@Override
public T get() {
  PipelineOptions options = optionsMap.get(optionsId);
  if (options == null) {
    throw new IllegalStateException(...

One could presumably be forgiven for thinking runtime is when the pipeline is running.

Anyway, does anybody know how would I unit test the parameter defaulting, assuming the top code snippet is how it should be set up and it is supported? Thank you.

bad_coder
  • 11,289
  • 20
  • 44
  • 72
nsandersen
  • 896
  • 2
  • 16
  • 41
  • According to the documentation "For unit-testing a transform against a ValueProvider that only becomes available at runtime, use TestPipeline.newProvider(T)." Does using newProvider solves your problem? https://beam.apache.org/releases/javadoc/2.11.0/index.html?org/apache/beam/sdk/testing/TestPipeline.html – rmesteves Jul 23 '20 at 11:29
  • I saw that, but does this not set the parameter value explicitly? (I am trying to test that it uses the annotated default value.) – nsandersen Jul 23 '20 at 13:04
  • Looking at the code, it looks like ValueProviders are accessible once a PipelineOptions object has been deserialized ([source code](https://github.com/apache/beam/blob/6fdde4f4eab72b49b10a8bb1cb3be263c5c416b5/sdks/java/core/src/main/java/org/apache/beam/sdk/options/ValueProvider.java#L240)). How are you testing this out? Are you running it in a way that wouldn't deserialize a `PipelineOptions`? (for ex. running on a local runner, or not using any PipelineOptions when running) – Daniel Oliveira Jul 23 '20 at 22:35
  • @nsandersen I agree with Daniel about the deserialized object. Can you let us know more? – rmesteves Jul 27 '20 at 13:38
  • Apologies, I had to work on something else for a while. Example code I use to run pipeline tests added. – nsandersen Dec 03 '20 at 19:05

2 Answers2

0

I had the same problem when I was generating a Dataflow template from Eclipse, my Dataflow template receives a parameter from Cloud Composer DAG.

I got the solution from the Google Cloud documentation: https://cloud.google.com/dataflow/docs/guides/templates/creating-templates#using-valueprovider-in-your-functions

0

You can also use Flex Tempaltes and avoid all the hassles with ValueProviders.

robertwb
  • 4,891
  • 18
  • 21