Questions tagged [beam-sql]

BeamSQL is built on top of Apache Beam Java SDK, as a relational API for unified batch and streaming data processing.

BeamSQL is built on top of Apache Beam Java SDK, as a relational API for unified batch and streaming data processing.

BeamSQL features

  1. Connect heterogeneous storage systems - access data from different systems with ease.
  2. Pure SQL pipelines in SQL shell - lower the barrier to write data processing pipelines.
  3. Embedded SQL in pipelines - more flexibility and productivity.
  4. Unified bath and streaming semantics - towards one SQL for batch, streaming and mixed use cases.

Resources

45 questions
2
votes
1 answer

Query Avro Schema using Beam SQL

I'm trying to read avro files with Apache Beam and use Beam SQL to transform the data. I'm still new in Beam and Java. Here's my simple code: public class BeamSQLReadAvro { @SuppressWarnings("serial") public static void main(String[] args)…
Yusata
  • 199
  • 1
  • 3
  • 16
2
votes
1 answer

Beam SQL Not Firing

I am building a simple prototype wherein I am reading data from Pubsub and using BeamSQL, code snippet as below val eventStream: SCollection[String] = sc.pubsubSubscription[String]("projects/jayadeep-etl-platform/subscriptions/orders-dataflow") …
2
votes
2 answers

Apache beam: SQL aggregation outputs no results for Unbounded/Bounded join

I am working on an apache beam pipeline to run a SQL aggregation function.Reference: https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/BeamSqlDslJoinTest.java#L159. The example here…
Akshata
  • 1,005
  • 2
  • 12
  • 22
2
votes
2 answers

How to fix "Joining unbounded PCollections is currently only supported for non-global windows with triggers" in Apache Beam

I'm trying to join 2 unbounded sources using Apache Beam Java SDK. While Joining Im getting the below error message. Exception in thread "main" java.lang.UnsupportedOperationException: Joining unbounded PCollections is currently only supported…
Gowtham
  • 87
  • 1
  • 14
1
vote
0 answers

Python Beam SqlTransform unknown coder exception when getting data from PTransform

There's this PTransform that is mapping data to a beam.Row: class MapToBeamRow(beam.PTransform): def expand(self, pcoll: PCollection[Any]) -> PCollection[beam.Row]: return ( pcoll | beam.Map(lambda x:…
Ben Konz
  • 43
  • 5
1
vote
2 answers

How to convert PCollection to PCollection> in JAVA

I'm trying to convert a tablerow containing multiple values to a KV. I can achieve this in a DoFn but that adds more complexity to the code that I want to write further and makes my job harder. (Basically I need to perform CoGroupBy operation on two…
1
vote
1 answer

Apache Beam SQL error in Python - ValueError: Unsupported type: Any

I wrote an example based on the following code https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/sql_taxi.py I am getting an error message /usr/local/lib/python3.6/dist-packages/apache_beam/typehints/schemas.py in…
Jean Bouez
  • 41
  • 5
1
vote
0 answers

BEAM SQL and RECORD column type

I am trying to select records from a data file into a PCollection using Beam SQL. My data file has the below AVRO…
Vinod
  • 83
  • 1
  • 8
1
vote
1 answer

What's the difference between Dataflow sql, Beam SQL (Zeta sql or CALCITE SQL)?

While browsing I just came across Dataflow SQL. Is it any different from beamSQL?
1
vote
1 answer

What is the alternative for side inputs in apache beam

I am trying to join multiple kafka streams & lookups using Apache Beam. Im using side inputs for handling lookup tables and everything worked out in direct runner. But, when i try to run it in spark mode or flink mode, i learnt that side inputs are…
1
vote
1 answer

How to refresh/reload side input on every window

I am using Apache beam to join multiple streams along with some lookups. I have 2 scenarios, If, the lookup size is huge, I wanted the side input to reload/refresh for every record processing (i.e. I will query the database with where clause) and if…
Gowtham
  • 87
  • 1
  • 14
1
vote
1 answer

Build Nested structure using BeamSQL

In BigQuery we have "ARRAY_AGG" function which helps to convert the normal collection to Nested collection. Is there a similar way to build same kind of nested structure collection using BeamSQL?. Something like below query in BeamSQL, "Select…
lourdu rajan
  • 329
  • 1
  • 5
  • 24
1
vote
1 answer

BeamSQL Group By query problem with Float value

Tried to get the unique value from the BigQuery table using BeamSQL in Google Dataflow. Using Group By clause implemented the condition in BeamSQL (sample query below). One of the column has float data type. While executing the Job got below…
lourdu rajan
  • 329
  • 1
  • 5
  • 24
1
vote
1 answer

How to add google cloud pubsub as a source in Beam SQL shell?

I am trying out BeamSQL in shell and want to test how unbounded sources work in terms of usability and performance. Reading the documentation over here, I created an external table as follows- CREATE EXTERNAL TABLE pubsub_table (event_timestamp…
Abhishek
  • 681
  • 1
  • 6
  • 25
0
votes
0 answers

Apache beam ZetaSQL ANALYTICAL FUNCTIONS not enabled

I have a PCollection upon which I want to apply SQLTransform which runs an aggregation query as: SELECT *, ANY_VALUE(col1) OVER (PARTITION BY col2 ORDER BY timestamp) as some_name FROM PCOLLECTION Here is the code below: Schema appSchema = …
HUsr
  • 1
  • 1
1
2 3