Questions tagged [flink-sql]

Apache Flink features two relational APIs, SQL and Table API, as unified APIs for stream and batch processing.

Apache Flink features two relational APIs:

SQL (via Apache Calcite)
Table API, a language-integrated query (LINQ) interface

Both APIs are unified APIs for stream and batch processing. This means that the a query returns the same result regardless whether it is applied on a static data set or a data stream. SQL queries are parsed and optimized by Apache Calcite (Table API queries are optimized by Calcite).

Both APIs are tightly integrated with Flink's DataStream and DataSet APIs.

667 questions

votes

4 answers

Get nested fields from Kafka message using Apache Flink SQL

I'm trying to create a source table using Apache Flink 1.11 where I can get access to nested properties in a JSON message. I can pluck values off root properties but I'm unsure how to access nested objects. The documentation suggests that it should…

apache-flink flink-sql pyflink

asked Sep 23 '20 at 09:17

bash721

votes

1 answer

Does AWS Glue Scheme Registry support being used as Flink SQL Catalog?

Does AWS Schema Registry support being used as an SQL Catalog within Flink SQLK applications? For instance, the documentation shows an example of using a Hive Catalog: CREATE CATALOG hive WITH…

amazon-web-services apache-flink aws-glue flink-sql

asked Apr 03 '22 at 15:33

John

10,837
17
78
141

votes

3 answers

Apache Flink: How to enable "upsert mode" for dynamic tables?

I have seen several mentions of an "upsert mode" for dynamic tables based on a unique key in the Flink documentation and on the official Flink blog. However, I do not see any examples / documentation regarding how to enable this mode on a dynamic…

apache-flink flink-streaming flink-sql

asked Feb 01 '18 at 03:43

Austin York

votes

1 answer

How can we define nested json properties (including arrays) using Flink SQL API?

We have the following problem while using Flink SQL: we have configured Kafka Twitter connector to add tweets to Kafka and we want to read the tweets from Kafka in a table using Flink SQL. How can we define nested json properties (including arrays)…

apache-flink flink-streaming flink-sql

asked Apr 27 '21 at 17:27

mricat

votes

0 answers

PyFlink extract nested fields from JSON array

I'm trying to extract a few nested fields in PyFlink from JSON data received from Kafka. The JSON record schema is as follows. Basically, each record has a Result object within which there's an array of objects called data. I'm trying to extract the…

apache-flink flink-sql pyflink

asked Mar 31 '21 at 13:05

sumeetkm

votes

2 answers

Apache Flink Resource Planning best practices

I'm looking for recommendations/best practices in determining required optimal resources for deploying a streaming job on Flink Cluster. Resources are No. of tasks slots per TaskManager Optimal Memory allocation for TaskManager Max Parallelism

apache-flink flink-streaming flink-cep flink-sql

asked Jul 30 '20 at 07:46

ardhani

votes

1 answer

Is there a way to determine total job parallelism or number of slots required to run a Flink job(before it is run)

Is there a way to determine the total number of task slots that will be required to run the job from either the execution plan or in some other way without having to actually start the job first. According to this doc:…

apache-flink flink-streaming flink-cep flink-sql

asked Sep 05 '19 at 00:48

SherinThomas

1,881
4
16
20

votes

0 answers

FLINK SQL: row.getFieldsAs returns a LocalDateTime instead of a Timestamp?

Flink: 1.13.2 I'm having a StreamTableEnvironment tableEnv that read streaming data from a KafkaSource. From this tableEnv, I filter my data and transform it back to a DataStream. DataStream myStreamData = env.fromSource(source,…

java apache-flink flink-streaming flink-sql

asked Mar 22 '22 at 08:43

Benjamin Rémiche

votes

1 answer

How can I use Flink to implement a streaming join between different data sources?

I have data coming from two different Kafka topics, served by different brokers, with each topic having different numbers of partitions. One stream has events about ads being served, the other has clicks: ad_serves: ad_id, ip, sTime ad_clicks:…

apache-flink flink-streaming flink-sql

asked Jul 15 '21 at 11:48

David Anderson

39,434
4
33
60

votes

1 answer

How to sort a stream by event time using Flink SQL

I have an out-of-order DataStream that I want to sort so that the events are ordered by their event time timestamps. I've simplified my use case down to where my Event class has just a single field -- the timestamp field: public static void…

apache-flink flink-streaming flink-sql

asked Mar 03 '19 at 15:35

David Anderson

39,434
4
33
60

votes

2 answers

Apache Flink error java.lang.ClassNotFoundException: org.apache.flink.table.sources.TableSource?

I am writing a streaming service in Apache Flink. I am basically picking data from a CSV file by using org.apache.flink.table.sources.CsvTableSource. Below is the code for same: StreamTableEnvironment streamTableEnvironment = TableEnvironment …

java scala apache-flink flink-streaming flink-sql

asked Mar 06 '18 at 04:31

Srivatsa Sinha

votes

1 answer

How does parallelism works when using Flink SQL?

I understand that in the Flink Datastream world parallelism means each slot will get a subset of events [1]. A Flink program consists of multiple tasks (transformations/operators, data sources, and sinks). A task is split into several…

apache-flink flink-streaming flink-sql

asked Jul 21 '22 at 08:50

John

10,837
17
78
141

votes

2 answers

How to reference nested JSON within PyFlink SQL when JSON schema varies?

I have a stream of events I wish to process using PyFlink, where the events are taken from AWS EventBridge. The events in this stream share a number of common fields, but their detail field varies according to the value of the source and/or…

apache-flink flink-sql pyflink

asked Apr 10 '22 at 19:45

John

10,837
17
78
141

votes

1 answer

What is the difference between Lookup and Processing Time Temporal join in Flink?

In my opinion, Processing Time Temporal Join is used for a stream and an external database and always join the latest value in the external database based on the join condition. Also, Processing Time Temporal Join is used when the external table is…

apache-flink flink-streaming flink-sql

asked Feb 21 '22 at 15:55

Dilibaba

votes

2 answers

Flink sink filesystem as parquet - error on saving nested data

I am trying to convert a json data to parquet so than I can use Trino or presto to query. Sample JSON is as follows: {"name": "success","message": "test","id": 1, "test1": {"one": 1, "two": 2, "three": "t3"}, "test2": [1,2,3], "test3": [{"a":…

apache-flink parquet flink-sql flink-table-api

asked Oct 18 '21 at 07:24

success malla

2 3

…

44 45 Next