5

We have a fairly big Greenplum v4.3 cluster. 18 hosts, each host has 3 segment nodes. Each host has approx 40 cores and 60G memory.

The table we have is 30 columns wide, which has 0.1 billion rows. The query we are testing has 3-10 secs response time when there is no concurrency pressure. As we increase the # of queries we fired in parallel, the latency is decreasing from avg 3 secs to 50ish secs as expected.

But we've found that regardless how many queries we fired in parallel, we only have like very low QPS(query per sec), almost just 3-5 queries/sec. We've set the max_memory=60G, memory_limit=800MB, and active_statments=100, hoping the CPU and memory can be highly utilized, but they are still poorly used, like 30%-40%.

I have a strong feeling, we tried to feed up the cluster in parallel badly, hoping to take the best out of the CPU and Memory utilization. But it doesn't work as we expected. Is there anything wrong with the settings? or is there anything else I am not aware of?

Shengjie
  • 12,336
  • 29
  • 98
  • 139
  • Do you have sar available? How does the disk IO look like in each host? Can your disk IO catch up when increasing the number of queries? – Sung Yu-wei Sep 14 '17 at 16:55
  • disk io, CPU, memory are all below 50% except for the master node's CPU is a bit higher, max 60%. Regardless how many queries we push into the cluster, we only got 3-5 queries/sec. – Shengjie Sep 17 '17 at 04:01
  • If you have a query that runs in about 10 seconds, with 5 queries it's expected to run in 50 seconds. Nothing so abnormal here. Have you tried query or database optimization? By database optimization you could create INDEX (https://gpdb.docs.pivotal.io/4390/admin_guide/ddl/ddl-index.html) and/or enable the Pivotal Query Optimizer (https://gpdb.docs.pivotal.io/4390/admin_guide/query/topics/query-piv-opt-overview.html). Also, have you checked this article: https://gpdb.docs.pivotal.io/4390/admin_guide/load/topics/g-optimizing-data-load-and-query-performance.html ? – MiguelKVidal Oct 03 '17 at 23:02
  • Also, the time you gave us is the time the database needs to fetch the data or the time the app (or browser) needs to present the results? Even if it's the time in the database, have you considered the network time to return the data from your database to your workstation? Maybe there's another culprit (and it's not your database). – MiguelKVidal Oct 03 '17 at 23:05

2 Answers2

1

There might be multiple reasons for such behavior.

Firstly, every Greenplum query uses no more than one processor core on one logical segment. Say, you have 3 segments on every node with 40 physical cores. Running two parallel queries will utilize maximum 2 x 3 = 6 cores on every node, so you will need about 40 / 6 ~= 6 parallel queries to utilize all of your CPUs. So, maybe for your number of cores per node its better to create more segments (gpexpand can do this). By the way, are the tables that used in the queries compressed?

Secondly, it may be a bad query. If you will provide a plan for the query, it may help to understand. There some query types in Greenplum that may have master as a bottleneck.

Finally, that might be some bad OS or blockdev settings.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
0

I think this document page Managing Resources might help you mamage your resources

  • You can use Resource Group limit/controll your resource especialy concurrency attribute(The maximum number of concurrent transactions, including active and idle transactions, that are permitted in the resource group).
  • Resouce queue help limits ACTIVE_STATEMENTS

Note: The ACTIVE_STATEMENTS will be the total statement current running, when you have 50s cost queries and next incoming queries, this could not be working, mybe 5 * 50 is better. Also, you need config memory/CPU settings to enable your query can be proceed.

Lei Chi
  • 216
  • 1
  • 14