69

I'm getting this following error when trying to delete records from a table created through GCP Console and updated with GCP BigQuery Node.js table insert function.

UPDATE or DELETE DML statements are not supported over table stackdriver-360-150317:my_dataset.users with streaming buffer

The table was created without streaming features. And from what I'm reading in documentation Tables that have been written to recently via BigQuery Streaming (tabledata.insertall) cannot be modified using UPDATE or DELETE statements.

Does it mean that once a record has been inserted with this function into a table, there's no way to delete records? At all? If that's the case, does it mean that table needs to be deleted and recreated from scratch? If that's not the case. Can you please suggest a workaround to avoid this issue?

Thanks!


Including new error message for SEO: "UPDATE or DELETE statement over table ... would affect rows in the streaming buffer, which is not supported" -- Fh

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
Diego
  • 1,678
  • 1
  • 16
  • 20

2 Answers2

57

To check if the table has a streaming buffer, check the tables.get response for a section named streamingBuffer or, when streaming to a partitioned table, data in the streaming buffer has a NULL value for the _PARTITIONTIME pseudo column, so even with a simple WHERE query can be checked.

Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table but it can take up to 90 minutes to become available for copy/export and other operations. You probably have to wait up to 90 minutes so all buffer is persisted on the cluster. You can use queries to see if the streaming buffer is empty or not like you mentioned.

If you use load job to create the table, you won't have streaming buffer, but probably you streamed some values to it.


Note the answer below to work with tables that have ongoing streaming buffers. Just use a WHERE to filter out the latest minutes of data and your queries will work. -- Fh

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
Pentium10
  • 204,586
  • 122
  • 423
  • 502
  • 4
    Thanks for the answer. It was helpful. I was able to delete records the next day after streamingBuffer was no longer included from tables.get API response. https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/get. I realized streamingBuffer is just a transient state. You just need to wait until it no longer shows up to update or delete records. As you explained above. – Diego Mar 29 '17 at 20:33
  • 1
    Thanks Diego for the question, I had the same confusion, and thank @Pentium10 for the answer. Is there a way to check the streaming buffer from the GCP Web UI? i.e. not through the Node.js API with table.get? – Peza Feb 01 '18 at 07:15
  • 1
    Just had a closer look - its pretty clear when there are elements in the streaming buffer. You just hit 'refresh' when viewing the table metadata. Thanks guys! – Peza Feb 01 '18 at 08:54
  • 5
    Is it possible to flush streaming buffer? – beloblotskiy Mar 01 '18 at 23:24
  • 1
    There is a feature request, please +1 vote counts https://issuetracker.google.com/issues/70324396 – Pentium10 Mar 02 '18 at 07:25
  • 4
    To check if you have a streaming buffer, press on a table > Details > Check in bottom if there is a header "Streaming buffer statistics". If yes, then you have something in the buffer. – WJA Jan 08 '21 at 17:11
  • Better yet, can we just turn the stupid thing off? – GPP Jun 05 '21 at 08:44
43

Make sure to change your filters so they don't include data that could be in the current streaming buffer.

For example, this query fails while I'm streaming to this table:

DELETE FROM `project.dataset.table` 
WHERE id LIKE '%-%'

Error: UPDATE or DELETE statement over table project.dataset.table would affect rows in the streaming buffer, which is not supported

You can fix it by only deleting older records:

DELETE FROM `project.dataset.table` 
WHERE id LIKE '%-%'
AND ts < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 40 MINUTE)

4282 rows affected.
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • 10
    for me it worked with `< DATETIME_SUB(CURRENT_DATETIME(), INTERVAL 90 MINUTE)`, such as @Pentium10 said. Thanks! – Guy P Mar 16 '20 at 12:03
  • 8
    I know this question is a little bit late but what does `ts` mean? is that a column name on bigquery? – Kboyz Mar 10 '21 at 07:05
  • 1
    yeah I guess it will be a column where he keeps track of the insertion timestamp – Seba92 Apr 06 '21 at 16:44
  • @Felipe Hoffa, related: once records *are* successfully deleted, is it expected that they will continue to display in SELECT queries for a finite time? – GPP May 31 '21 at 15:43