2

We are doing exactly this (see answer) ->

Apache Beam/Dataflow Reshuffle

and I read this

Apache Beam/Dataflow ReShuffle deprecated, what to use instead?

I am wondering how to modify this code to not use deprecated Reshuffle

    PCollection<OrderlyBeamDto<RosterRecord>> records = PCollectionList.of(csv.get(TupleTags.OUTPUT_RECORD)).and(excel.get(TupleTags.OUTPUT_RECORD))
        .apply(Flatten.pCollections())
        .apply("Reshuffle Records", Reshuffle.viaRandomKey());

Also, in local apache beam runner, this reshuffle actually makes us drop events INTERMITTENTLY. Our tests become flaky passing sometimes and failing sometimes as this phase drops events BUT only if we reshuffle (We added logs to figure that out). I am hoping that by removing reshuffle and adding whatever is the new way fixes that issue as well.

Dean Hiller
  • 19,235
  • 25
  • 129
  • 212

0 Answers0