1

What's the difference between pane and window? The incoming elements are grouped into windows. Then what does a pane contain?

I took the following code from beam docs

.of(new DoFn<String, String>() {
     public void processElement(@Element String word, PaneInfo paneInfo) {
  }})

Does each element belong to one pane? Or multiple panes? Need a simple analogy to understand pane and window

bigbounty
  • 16,526
  • 5
  • 37
  • 65

1 Answers1

6

Windowing strategies partitions data by their event time. One element can belong to multiple windows (sliding windows).

Pane is fired by triggers for each window. A window can emit multiple panes depending on how many times a trigger is fired. If there is no trigger, it fires only one pane when the window is out of scope.

Data emitted by each pane then can be aggregated together by the accumulation mode.

You can think a window as a class, a pane as an instance of that class. An element can belong to one or more windows and is used by windows to emit panes.

More details can be found in the programming guide in sessions about windows and triggers.

When you specify a trigger, you must also set the the window’s accumulation mode. When a trigger fires, it emits the current contents of the window as a pane. Since a trigger can fire multiple times, the accumulation mode determines whether the system accumulates the window panes as the trigger fires, or discards them.

To set a window to accumulate the panes that are produced when the trigger fires, invoke.accumulatingFiredPanes() when you set the trigger. To set a window to discard fired panes, invoke .discardingFiredPanes().

ningk
  • 1,298
  • 1
  • 7
  • 7
  • That's a good answer. Is it okay to think of pane as a subset of window? – bigbounty May 04 '21 at 06:39
  • 1
    They are kind of different things with similar time-related properties, not really a subset. You can think of a pane as a snapshot of data in a window. If you are looking at a monthly view in a calendar, say in May, you have 31 days. In the calendar you have 31 boxes. Let's say there is a fixed length 24-hour window defined from the beginning of May. Then each box is actually a window and events/meetings scheduled in the calendar are data. – ningk May 04 '21 at 17:21
  • 1
    Then say you define a trigger that you would look at today's events once per hour. For example on May 4, your behavior would be checking the calendar 24 times. Each time, you see a snapshot of currently scheduled events on this day. And altogether you emit 24 panes of currently scheduled events for today. – ningk May 04 '21 at 17:22
  • 1
    If someone scheduled an ad hoc meeting at 11 AM (event time) to have some discussion with you later today. You cannot see the event/data until your 11th pane if the calendar is working perfectly. And if there is a lag in the calendar that the meeting doesn't show up until 5 PM, you can only see the data after the 17th pane. – ningk May 04 '21 at 17:27
  • 1
    Note that accumulation mode controls the contents of successive panes, e.g. you can see just the new elements for this window since the last pane (discarding mode), or all elements that were ever added to this window (accumulating mode). – robertwb May 07 '21 at 00:46