According to the Beam website,
Often it is faster and simpler to perform local unit testing on your pipeline code than to debug a pipeline’s remote execution.
I want to use test-driven development for my Beam/Dataflow app that writes to Bigtable for this reason.
However, following the Beam testing documentation I get to an impasse--PAssert isn't useful because the output PCollection contains org.apache.hadoop.hbase.client.Put objects, which don't override the equals method.
I can't get the contents of the PCollection to do validation on them either, since
It is not possible to get the contents of a PCollection directly - an Apache Beam or Dataflow pipeline is more like a query plan of what processing should be done, with PCollection being a logical intermediate node in the plan, rather than containing the data.
So how can I test this pipeline, other than manually running it? I'm using Maven and JUnit (in Java since that's all the Dataflow Bigtable Connector seems to support).