4

I have a table with timeuuid as a clustering key.

CREATE TABLE event (
    domain TEXT,        
    createdAt TIMEUUID,
    kind TEXT,
    PRIMARY KEY (domain, createdAt)
);

I wish to select the data in order of this clustering key with the following guarantee - if I selected something, there will be NO inserts before those records in the future(so I could iterate through records checking what's new happened without the risk of skipping any events)

SELECT kind FROM event WHERE domain = ? AND createdAt > lastCreatedAtWeAreAwareOf

If I generate timeuuid on client and use parallel insert to scylla it's technically possible that recent timeuuid will get inserted first before several older(say due to say some networking issue) and I might miss those records in my selects.

What are possible ways to resolve this?

I tried using currentTimeUUID function and it seems to work(monotonically increasing within the same partition key) but creates a lot of duplicates(20-40 duplicates per the same partition key), I.e I end up with lots of records with exactly the same currentTimeUUID(I would really like a way to avoid duplicates, it complicates the select process and consumes unnecessary resources)

I'm also curious is there a threat of backward clock jumps when using currentTimeUUID function?

let4be
  • 1,048
  • 11
  • 30
  • Out of curiosity, how many parallel inserts are you doing? For currentTimeUUID to create 20-40 duplicates you would have to send a large number of writes with exactly the same timestamp. Is that right? – haaawk Apr 16 '20 at 05:29

1 Answers1

4

EDITED

It seems that there's a bug in Scylla that currentTimeUUID always generates duplicates for writes done at the same time using the same coordinator. I created an issue here. Thanks for bringing this up.

PREVIOUS ANSWER BELOW

If I generate timeuuid on client and use parallel insert to scylla it's technically possible that recent timeuuid will get inserted first before several older(say due to say some networking issue) and I might miss those records in my selects.

Just to clarify, all writes will be stored in the right order. There will be a point in time when you will be able to read old enough writes in the right order. This means that one possible solution would be to make sure that select does not query too recent data. Thus leaving a window for 'late' writes to arrive and take their place in line. For example, you could use a select like this:

SELECT kind FROM event WHERE domain = ? AND createdAt > lastCreatedAtWeAreAwareOf AND createdAt < now() - 30s

I don't know whether it's ok for you to impose such delay though. This strategy won't give you a full certainty because all writes that got delayed by more than 30s will be missed.

I tried using currentTimeUUID function and it seems to work(monotonically increasing within the same partition key) but creates a lot of duplicates(20-40 duplicates per the same partition key), I.e I end up with lots of records with exactly the same currentTimeUUID(I would really like a way to avoid duplicates, it complicates the select process and consumes unnecessary resources)

You can reduce the chances of clustering key duplications by introducing additional clustering key column like:

CREATE TABLE event (
    domain TEXT,        
    createdAt TIMEUUID,
    randomBit UUID/int,
    kind TEXT,
    PRIMARY KEY (domain, createdAt, randomBit)
);

and generate value for it on the client in some good random way. Maybe there's some aspect of the record that you know is guaranteed to be unique and could be used as a clustering key column. It would work better than a random field.

haaawk
  • 330
  • 1
  • 8
  • I cannot have any chance of missing a write nor can I delay reads(data goes to user and he won't wait). Seems like I will have to live with duplicates for now, until it's fixed – let4be Apr 16 '20 at 07:47
  • Just to clarify, am I absolutely safe doing selects without delays when using currentTimeuuid for inserts? – let4be Apr 16 '20 at 07:54
  • 1
    There's always a chance with UUID and TimeUUID that there will be a collision. It seems that many people believe the chance is so small that it is practically irrelevant but theoretically you can have collisions. – haaawk Apr 16 '20 at 08:13
  • 1
    You don't have to wait for the fix @let4be - you can add an additional UUID clustering column. This would practically guarantee no duplicates if you generate this UUID value on the client. Or you can model the clustering key as (timestamp + UUID) to safe storage. timestamp can be obtained with something like now() and UUID can be generated on the client. – haaawk Apr 16 '20 at 08:14
  • 1
    I'm aware of TimeUUID having chance of collision. However collision for my use case just degrades performance and usability of select(we may pull duplicate records between selects). I'm more interested in a guarantee that DB will NOT generate(and insert) ANY records with TimeUUID below ANY previously inserted record. If this is violated my selects will be missing very important data, because they assume that if they read something - no records will be added before those in the future(they expect monotonically increasing TimeUUID only). Please note I'm always reading within partition key. – let4be Apr 16 '20 at 08:18
  • 2
    There's no such guarantee. For two writes A and B happening at the same time T. It is possible that TimeUUID generated for A is sorted before TimeUUID generated for B but your select will for short period of time see only write B. This is because Scylla is distributed system and network imposes nonconstant delay. Notice the 'for short period of time'. In the long run they will both be present in select. – haaawk Apr 16 '20 at 08:46
  • 2
    The only solution I can think of from the top of my head is to partition this table a bit more so that each partition is populated by just one data loader and the loader does this sequentially. Meaning it does a write, waits for it to finish successfully and only then does next write. This way you would have a guarantee that you won't miss anything from a partition. – haaawk Apr 16 '20 at 08:54
  • 1
    Thanks @haaawk for your answers! I might need to reevaluate this part of the system. I also considered using CDC(but it brings a solid chunk of complexity with generations and stuff)., also it seems CDC might have the same underlying issue for my use case... Probably I should drop TimeUUID all together and just use external service like redis streams that guarantees order of reads as it's essentially append only structure(and use provided order to query scylla by PK exclusively)... But then there's questions of reliability, this stuff is very non trivial... – let4be Apr 16 '20 at 08:59
  • 2
    I don't know your whole system but if you can create more partitions and make sure each partition is populated sequentially, you would have the guarantee you need. Good luck! – haaawk Apr 16 '20 at 09:02
  • 1
    FYI @let4be fix was posted for the issue opened: https://github.com/scylladb/scylla/issues/6208 – TomerSan Apr 17 '20 at 00:42