3

I'm trying to retrieve the max value of a column in my aws keyspace table using python. I have it as a method in a class

        query = SimpleStatement('select min (timestamp) from '  + keyspace + '.'+ table_name) #+ ' limit 2'
        read = self.session.execute(query)
        
        return read.current_rows

That is my read function, and I call it :

access.read_latest_timestamp('gdapp_finance','policy_test').

I get the below error. Can someone assist me please.

InvalidRequest                            Traceback (most recent call last)
/tmp/ipykernel_223/2808458841.py in <module>
----> 1 access.read_latest_timestamp('gdapp_finance','policy_test')

/tmp/ipykernel_223/1193162473.py in read_latest_timestamp(self, keyspace, table_name)
     58         query = SimpleStatement('select min (timestamp) from '  + keyspace + '.'+ table_name , \
     59                                consistency_level=ConsistencyLevel.ONE) #+ ' limit 2'
---> 60         read = self.session.execute(query)
     61 
     62         return read.current_rows

~/.local/lib/python3.9/site-packages/cassandra/cluster.cpython-39-x86_64-linux-gnu.so in cassandra.cluster.Session.execute()

~/.local/lib/python3.9/site-packages/cassandra/cluster.cpython-39-x86_64-linux-gnu.so in cassandra.cluster.ResponseFuture.result()

InvalidRequest: Error from server: code=2200 [Invalid query] message="min is not yet supported." ```
airdmhund
  • 43
  • 4

2 Answers2

3

AWS Keyspaces only supports a subset of native CQL functions. Functions like MIN() and MAX() are not supported.

Similarly, user-defined functions and aggregates are not supported either.

For the list of supported Cassandra functions, see Built-in functions in Amazon Keyspaces.

For the list of supported Cassandra APIs, see Supported Cassandra APIs, operations, and data types in Amazon Keyspaces. Cheers!

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
  • Thanks for this Erick. Indeed that was an eye-opener. Next question then will be how I can do such aggregate functions as i do not see them in the links provided, in other words, is there any alternative? Thanks – airdmhund Sep 08 '21 at 12:55
  • 1
    @airdmhund I would say that your best alternative would be to use an implementation of Cassandra which actually supports the functionality you need to use. – Aaron Sep 08 '21 at 13:21
  • 1
    Agreed with what @Aaron said. There's no alternative because user-defined functions (UDFs) are not supported either. Cheers! – Erick Ramirez Sep 08 '21 at 13:51
3

Functions like sum, min, max, avg, and count are not deterministic and give unpredictable performance due to collocating storage and compute on the same C* node or large partition or scatter gather multiple rows/partitions.

I would recommend writing a custom Lambda and run it against Keyspaces, or store an aggregate row

(let’s say the first/last row in my partition stores all aggregates, like {pk, _#sk, col1=min(col1), col2=max(col2),…, col3=count(col3) })

within a partition and update them based on the schedule.

Autumn88
  • 351
  • 1
  • 11