3

I have imported more than 1 core records in a table and when I do Select query count(*) it gives me error. I know it is a costly query but, can any one help me get a solution for the same.

SELECT COUNT(*) FROM TABLE1;

Error: OperationTimedOut: errors={'10.20.30.10': 'Client request timeout. See Session.execute_async'}, last_host=10.20.30.10

Erick Ramirez
  • 13,964
  • 1
  • 18
  • 23
Hareesha
  • 125
  • 9

2 Answers2

1

After performing multiple R&D I got a solution for count(*) issue.

Steps:

  1. Setup presto on Cassandra Cluster (I used presto-server-0.215 version (presto-server-0.215.tar.gz and used jdk: jdk-8u151-linux-x64.tar.gz)) 1.1. presto-server-0.215.tar.gz: https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.215/ 1.2. jdk-8u151-linux-x64.tar.gz: https://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8-2177648.html
  2. Install presto on one cassandra server and which will make it as coordinator and rest of the nodes in the cluster will be worker, please refer below URL for setting up presto. Refer URL: https://github.com/prestodb/presto/issues/3382
  3. You need add firewall rule for Presto Port which you have mentioned in config.properties file (I'm using RHEL 7.x OS)
  4. Do changes in launcher.py ---> Line number '214' path of jdk installed "command = ['/opt/jdk1.8.0_151/bin/java', '-cp', classpath]"
  5. Start presto ---> ./launcher start
  6. Open presto console http://localhost:8081 and you should see coordinator and worker nodes in the console.

  7. Download "presto-cli-0.215-executable.jar" (URL: https://prestodb.io/docs/current/installation/cli.html) and rename it to prestocli (Give 755 permission) and then test count(*) for a big table using 'prestocli' a table which has 75 Lakhs records which was giving error when we run in cqlsh. 7.1. CQLSH Error: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}

  8. Below is the solution for count(*)...

[root@casdb01 bin]# ./prestocli --server localhost:8081 presto> SELECT count(*) FROM cassandra.datamart.big_table;

_col0

7587418 (1 row)

Query 20190118_070908_00005_38tiw, FINISHED, 1 node Splits: 1,298 total, 1,298 done (100.00%) 0:53 [7.59M rows, 7.24MB] [142K rows/s, 139KB/s]

  1. For any application query you can presto as the interface to perform count(*).

Special thanks to my team met who helped me to get this result (Venkatesh Bhat).

Hareesha
  • 125
  • 9
0

So if this is a production system probably don't, but you seem to be aware of that.

> cqlsh --request-timeout=3600
SELECT count(*) FROM table1 ALLOW FILTERING;

Your system is timing out because the query is expensive and that is what your error is. You can extend the timeout to get around this, but at some point cassandra will just fail to come back if your dataset is big enough. Additionally you may want to run your query with a consistency of ONE so that you don't trigger read compactions and it will come back faster. It will of course be less accurate.

You might also try: nodetool cfstats mykeyspace it will only be an estimate but its better than nothing.

Highstead
  • 2,291
  • 3
  • 26
  • 30
  • No Luck... I tried ALLOW FILTERING ... ------------------------------------------------------------------------------------------------------- SELECT count(*) FROM TABLE1 ALLOW FILTERING ; ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} ------------------------------------------------------------------------------- CFSTAT also does not give me any data as nothing is been written for now. – Hareesha Dec 13 '18 at 07:41
  • So in `nodetool status` all of your nodes are up, does it work? How much data is in there (you could try increasing the timeout further, did it take 3600 seconds to come back?)? What is your nodecount and replication factor? – Highstead Dec 13 '18 at 16:29
  • - nodetool status = Yes Nodes are up and running (UN), I have 2 nodes in a cluster for now and we are planning to go to 5 nodes (Not soon) - nodecount = We have started production last month and have 2 nodes in a cluster... both as seed nodes. Running into issues with Big Tables - records over 1 crore to 51 crore - Have kept time out value as suggested - Replication Factor: 2 - {'class': 'SimpleStrategy', 'replication_factor': '2'} Please suggest if you can recommand best practices followed in production environment – Hareesha Dec 14 '18 at 08:14
  • You should use 3 nodes as soon as possible. If one was to go down your quorum writes and reads would both fail until you added a replaced the third node. The best practices and improvement well be more obvious based on schema. I saw significant performance gains in AWS by JBOD'ing my HDDs on the nodes for the same cost. Its especially useful with elastic drives. – Highstead Dec 14 '18 at 15:39
  • WRT your issue If you're running 100s of millions of rows i wouldn't expect that query to come back in any reasonable timeframe. – Highstead Dec 14 '18 at 15:40
  • I tried all the options suggested... can you let me know if any changes I need to do in cassandra.yaml file or some setting which I can use cqsh commands to get the count or can I have some code (Java or python or ...) to perform the same. – Hareesha Dec 18 '18 at 06:57