56

For Cassandra, do UPDATEs become an implied INSERT if the selected row does not exist? That is, if I say

 UPDATE users SET name = "Raedwald" WHERE id = 545127

and id is the PRIMARY KEY of the users table, and the table has no row with a key of 545127, will that be equivalent to

 INSERT INTO users (id, name) VALUES (545127, "Raedwald")

I know that the opposite is true: an INSERT for an id that already exists becomes an UPDATE of the row with that id. Older Cassandra documentation talked about inserts actually being "upserts" for that reason.

I'm interested in the case for CQL3, Cassandra version 1.2+.

Raedwald
  • 46,613
  • 43
  • 151
  • 237

4 Answers4

57

Yes, for Cassandra UPDATE is synonymous with INSERT, as explained in the CQL documentation where it says the following about UPDATE:

Note that unlike in SQL, UPDATE does not check the prior existence of the row: the row is created if none existed before, and updated otherwise. Furthermore, there is no mean to know which of creation or update happened. In fact, the semantic of INSERT and UPDATE are identical.

For the semantics to be different, Cassandra would need to do a read to know if the row already exists. Cassandra is write optimized, so you can always assume it doesn't do a read before write on any write operation. The only exception is counters (unless replicate_on_write = false), in which case replication on increment involves a read.

Taky
  • 5,284
  • 1
  • 20
  • 29
Richard
  • 11,050
  • 2
  • 46
  • 33
  • Hi, i have a question , update on key is same as insert , but if i want to update data on another column (such as username Or ... ) . is same as insert ? what the best solution for merge bulk data ? – محمد Jan 30 '18 at 10:33
37

Unfortunately the accepted answer is not 100% accurate. inserts are different than updates:

cqlsh> create table ks.t (pk int, ck int, v int, primary key (pk, ck));
cqlsh> update ks.t set v = null where pk = 0 and ck = 0;
cqlsh> select * from ks.t where pk = 0 and ck = 0;

 pk | ck | v
----+----+---

(0 rows)
cqlsh> insert into ks.t (pk,ck,v) values (0,0,null);
cqlsh> select * from ks.t where pk = 0 and ck = 0;

 pk | ck | v
----+----+------
  0 |  0 | null

(1 rows)

Scylla does the same thing.

In Scylla and Cassandra rows are sequences of cells. Each column gets a corresponding cell (or a set of cells in the case of non-frozen collections or UDTs). But there is one additional, invisible cell - the row marker (in Scylla at least; I suspect Cassandra has something similar).

The row marker makes a difference for rows in which all other cells are dead: a row shows up in a query if and only if there's at least one alive cell. Thus, if the row marker is alive, the row will show up, even if all other columns were previously set to null using e.g. updates.

inserts create a live row marker, while updates don't touch the row marker, so clearly they are different. The example above illustrates that. One could argue that row markers are "internal" to Cassandra/Scylla, but as you can see, their effects are visible. Row markers affect your life whether you like it or not, so it may be useful to remember about them.

It's sad that no documentation mentions row markers (well, I found this: https://docs.scylladb.com/architecture/sstable/sstable2/sstable-data-file/#cql-row-marker but it's in the context of explaining SSTable internals, which is probably dedicated to Scylla developers more than to users).

Bonus: a cell delete:

delete v from ks.t where pk = 0 and ck = 0

is the same as a null update:

update ks.t set v = null where pk = 0 and ck = 0

indeed, a cell delete also doesn't touch the row marker. It only sets the specified cell to null.

This is different from a row delete:

delete from ks.t where pk = 0 and ck = 0

because row deletes insert a row tombstone, which kills all cells in the row (including the row marker). You could say that row deletes are the opposite of an insert. Updates and cell deletes are somewhere in between.

kbr
  • 371
  • 3
  • 3
  • 3
    That's an interesting call-out. I tried your example, and it worked just as you said. Of course, when you set a value `v` to something other than `null`, `update` does create a row marker, so it looks like it behaves differently for `null` values. That must be the "one alive cell" you referred to. Again, that's a good find! – Aaron Feb 05 '20 at 14:44
  • 2
    "Of course, when you set a value v to something other than null, `update` does create a row marker" nope! Try updating the value back to `null`. EDIT, more details: After `update`'ing `v` to non-`null`, then updating it back to `null`, the row will be gone. However, after setting `v` with an `insert`, then updating it to `null`, the row will be still there. Because the `insert` created a row marker. – kbr Feb 05 '20 at 14:46
  • 1
    The reason you see the row after updating `v` to non-`null` is because the `v` cell is alive. The row marker is not. After `insert` with non-`null` `v`, there are two alive cells: the `v` cell and the row marker. – kbr Feb 05 '20 at 14:51
  • 1
    That's a great explanation. It would be a good topic for a blog post! – Aaron Feb 05 '20 at 16:39
  • 2
    Thanks. We'll see about the blog post :) – kbr Feb 05 '20 at 17:01
  • 1
    Sorry for dredging this up a year later, but does all this mean if I only use `UPDATE` to insert data I never get a row marker, and if so, do you think this would cause any issues further down the line? – Matt Fellows Feb 12 '21 at 13:30
  • 4
    @MattFellows if you're not deleting data it doesn't matter. if you're deleting entire rows (using `delete from`) (without specifying the column) it doesn't matter. if you're deleting columns (using `delete X from`, where `X` is a column name) then: 1. if you used UPDATE, deleting all columns will cause the row to disappear 2. if you used INSERT, deleting all columns will not cause the row to disappear - it will appear with all regular columns equal to `null` (it will just have the primary key columns) so it depends on your use case and expectations – kbr Feb 15 '21 at 11:08
22

What one can do is this however:

UPDATE table_name SET field = false WHERE key = 55 IF EXISTS;

This will ensure that your update is a true update and not an upsert.

KingOfHypocrites
  • 9,316
  • 9
  • 47
  • 69
0

No, they are not equal, as @kbr has already explained. You can read more about the difference between UPDATE and INSERT statements in this Scylla docs article - this is the most in-depth explanation from the developers themselves.

In short, for the table:

CREATE TABLE ks.t (
  pk int, 
  ck int, 
  v int, 
  PRIMARY KEY (pk, ck)
);

the statement:

INSERT INTO ks.t (pk, ck, v) VALUES (0, 0, 0);

is equivalent to:

BEGIN UNLOGGED BATCH
    INSERT INTO ks.t (pk, ck) VALUES (0, 0);
    UPDATE ks.t SET v = 0 WHERE pk = 0 AND ck = 0;
APPLY BATCH;
winwin
  • 958
  • 7
  • 25