10

Suppose I have 2 tables A and B. I create a MV(materialized view) with a join query of two tables, psuedo like:

create materialized view a_b engine = Memory as 
select * from(
    select * from A
) all inner join (
    select * from B
) using some_col;

I known that a_b is only updated when inserting data into table A and nothing else happen when push data to B. I want my MV have to update when both table are updated.

My workaround is to create another MV that change postition of A, B and point to a_b like

create materialized view a_b_2 to a_b as 
select * from(
    select * from B
) all inner join (
    select * from A
) using same_col;

I have some questions about this approach:
1. Are there any more legal way to archive same effect in clickhouse?
2. Suppose I have 2 incoming batches data BD_A and BD_B are going to insert to A and B simultaneously. Some data of 2 batches themself (BD_A_B) is fit join condition . Is there any chance that the MV lost those BD_A_B because MV a_b processes BD_A with before-inserted B and MV a_b_2 processes BD_B with before-inserted A.

Thang Nguyen
  • 1,110
  • 8
  • 17
  • 3
    There isn't an easy way to achieve this. The implementation of materialized view in ch relies on one and only one input block stream which can only be related to one table. It's also inefficient to update the view upon the right join table since it needs to recreate a hash table each time, or else keeping a large hash table and cosuming a lot of memory. – Amos Jul 10 '18 at 02:23
  • @nabongs I've got the same problem. But for me, it does not work even only `A` table is updated. My view is always empty. Do you have some solution? The code is exactly the same as yours. – Joozty Jul 11 '18 at 14:54
  • @Amos So the question is why would I use `Materialized View` if I cannot use join? It doesn't make sense for me. – Joozty Jul 11 '18 at 14:57
  • 1
    @Joozty For me, as mentioned, it works when insert to `A` and if I want to it auto update when `B` is updated I must create another `MV` point to first MV. – Thang Nguyen Jul 11 '18 at 15:28
  • What engines do you use for your `A` and `B` table? I asked a similar question on GitHub and I got an [answer](https://github.com/yandex/ClickHouse/issues/2612#issuecomment-404213745) that it is not possible. I am wondering it is working for you because it shouldn't have... – Joozty Jul 11 '18 at 15:54
  • @Joozty `MV` is mainly used for pre-aggregation in Clickhouse. – Amos Jul 22 '18 at 06:03

1 Answers1

2

As far as I understand, you are trying to have a workaround of a limitation.

Clickhouse does not support multiple source tables for a MV and they have quite good reasons for this. I actually asked this to devs and got this answer:

In ClickHouse materialized view behaves more like BEFORE INSERT TRIGGER, each time processing new block arrived with insert.

So that is quite natural limitation as inserts to 2 different table will come asynchronously and you usually expect to see in JOINs whole table not only newly arrived blocks.

ramazan polat
  • 7,111
  • 1
  • 48
  • 76