2

I have one steam and a table in KSQL as mentioned below:

Stream name: DEAL_STREAM

Table name: EXPENSE_TABLE

When I run the below queries it displays only columns from the stream but no table columns are being displays.

Is this the expected output. If not am I doing something wrong?

SELECT TD.EXPENSE_CODE, TD.BRANCH_CODE, TE.EXPENSE_DESC
FROM DEAL_STREAM TD
LEFT JOIN EXPENSE_TABLE TE ON TD.EXPENSE_CODE = TE.EXPENSE_CODE
WHERE TD.EXPENSE_CODE LIKE '%NL%' AND TD.BRANCH_CODE LIKE '%AM%';

An output of the query is as shown below.

NL8232@#0          | AM   | null
NL0232@#0          | AM   | null
NL6232@!0          | AM   | null
NL5232^%0          | AM   | null
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
Zamir Arif
  • 341
  • 2
  • 13

1 Answers1

0

When I run the below queries it displays only columns from the stream but no table columns are being displays.

In a stream-table (left) join, the output records will contain null columns (for table-side columns) if there is not matching record in the table at the time of the join/lookup.

Is this the expected output. If not am I doing something wrong?

Is it possible that, for example, you wrote the (1) input data into the stream before you wrote (2) the input data into the table? If so, then the stream-table join query would have attempted to perform table-lookups at the time of (1) when no such lookup data was available in the table yet (because that happened later at time (2)). Because there was no such table data available, the join wrote output records where the table-side columns were null.

Note: This stream-table join in KSQL (and, by extension, Apache Kafka's Streams API, on which KSQL is built) is the pretty much the norm for joins in the streaming world. Here, only the stream-side of the stream-table join will trigger downstream join outputs, and if there's no matching for a stream record on the table-side at the time when a new input record is being joined, then the table-side columns will be null. Since this is, however, a common cause of user confusion, we are currently working on adding table-side triggering of join output to Apache Kafka's Streams API and KSQL. When such a feature is available, then your issue above would not happen anymore.

miguno
  • 14,498
  • 3
  • 47
  • 63
  • In this case. The data was loaded a few weeks back using Oracle Golden Gate from Oracle DB to Kafka. Now Today I created the stream and table from existing data in topics and set the offset to earliest so it replays everything. when join ran all the data was existing in streams and tables so ideally I should have got matches. Am I missing anything here? – Zamir Arif Apr 25 '18 at 13:21
  • I have 2 more questions 1. When new data is being added to the stream then for join the full scan of the table will be done or not? 2. Can we join 2 tables in 4.1.0 because I saw some discussion on GitHub a few days back? Is this implemented? – Zamir Arif Apr 25 '18 at 13:27
  • 1) Yes, when new data records are being added to the stream, then a subsequent join (for those new records) will lookup the corresponding table data (because the table data is now available). 2) Table-table joins are not supported in 4.1.0 yet, but will be added soon. – miguno Apr 25 '18 at 14:10
  • 1
    Michael G. Noll - I did following to cleanly test this issue. 1. Cleared all the topic data for streams in my data pipeline. 2. All the table still exists. 3. Run the initial load from OGG which populates my streams. 4. Checked for all typo etc. 5. Compare the code of the join to existing table code for description they exists. Infact description for all the code exists. 6. Still data from stream which code is being populated but data from table which is corresponding desc for code are not being populated. Please help – Zamir Arif Apr 26 '18 at 12:07
  • Hello, I have been facing the same issue in which data exist in the ktable and when i join it with a stream the columns of stream is printed but that of table shows blank columns please help !! – Raj Saraogi Aug 28 '19 at 05:44