0

Recently I'm learning apache beam, and find some python code like this:

lines = p | 'read' >> ReadFromText(known_args.input)

  # Count the occurrences of each word.
  def count_ones(word_ones):
    (word, ones) = word_ones
    return (word, sum(ones))

  counts = (lines
            | 'split' >> (beam.ParDo(WordExtractingDoFn())
                          .with_output_types(unicode))
            | 'pair_with_one' >> beam.Map(lambda x: (x, 1))
            | 'group' >> beam.GroupByKey()
            | 'count' >> beam.Map(count_ones))

From: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount.py#L92

What is the syntax and usage of | and >> in python?

vego
  • 889
  • 1
  • 8
  • 20
  • 3
    Does this answer your question? [Pipe character in Python](https://stackoverflow.com/questions/5988665/pipe-character-in-python), also [>> operator in Python](https://stackoverflow.com/questions/3411749), also [Explain Apache Beam python syntax](https://stackoverflow.com/questions/43796046/explain-apache-beam-python-syntax) – metatoaster Nov 12 '19 at 06:14
  • 2
    This looks like the behaviour of the operators is overwritten. What they do (now) shoulds be found in the library's documentation. – Klaus D. Nov 12 '19 at 06:18

1 Answers1

5

By default | stands for the logical or bit-wise OR operator, and >> for right shift, but fortunately you can overload operators in Python. So in order to have custom definition for | and >>, you just have to overload the two following dunder(magic) methods in your class __or__ and __rshift__:

class A():
    def __or__(self):
        pass
    def __rshift__(self):
        pass

I recommend you to read more about Python Data Model.

Now Looking on the Beam Python SDK, __or__ is overloaded in the PTransform class:

  def __or__(self, right):
    """Used to compose PTransforms, e.g., ptransform1 | ptransform2."""
    if isinstance(right, PTransform):
      return _ChainedPTransform(self, right)
    return NotImplemented
adnanmuttaleb
  • 3,388
  • 1
  • 29
  • 46
  • 1
    this is the best explanation I found so far. It's 2021 now - and still no human-readable documentation from Google explaining their Dataflow syntax... I guess the assumption is that those willing to use it will happily read Apache Beam source code to understand what it is doing :)) – Marina Oct 21 '21 at 15:42
  • docs here https://beam.apache.org/documentation/programming-guide/#transforms – David Xia Apr 18 '22 at 21:58