2

I am looking for an efficient alternative to subqueries/joins for this query. Let's say I a table that stores information about companies with the following columns:

  • name: the name of the company
  • state: the state the company is located in
  • revenue: the annual revenue of the company
  • employees: how many employees this company has
  • active_business: wether or not the company is in business (1 = yes, 0 = no)

Let's say that from this table, I want to find out how many companies in each state meet the requirement for some minimum amount of revenue, and also how many companies meet the requirement for some minimum number of employees. This can be expressed as the following subquery (can also be written as a a join):

SELECT state,
       (
            SELECT count(*)
            FROM records AS a
            WHERE a.state = records.state
              AND a.revenue > 1000000
       ) AS companies_with_min_revenue,
       (
            SELECT count(*)
            FROM records AS a
            WHERE a.state = records.state
              AND a.employees > 10
       ) AS companies_with_min_employees
FROM records
WHERE active_business = 1
GROUP BY state

My question is this. Can I do this without the subqueries or joins? Since the query is already iterating over each row (there's no indexes), is there some way I can add a condition that if the row meets the minimum revenue requirements and is in the same state, it will increment some sort of counter for the query (similar to map/reduce)?

jarlh
  • 42,561
  • 8
  • 45
  • 63
user396404
  • 2,759
  • 7
  • 31
  • 42
  • 3
    There's no indexes? Well, there's your problem then. Solved. – Strawberry Jul 08 '16 at 06:19
  • Typically, this situation would be indexed. However, this isn't actually running on mysql. I'm using the JSON datastore in Memsql with dynamic fields submitted by users, so indexes on specific values wasn't the right path. Also, it already runs extremely fast since memsql is in-memory, so the indexes weren't needed. I submitted the question like this since it was logically equivalent to the problem in a simpler manner. – user396404 Jul 13 '16 at 20:53

1 Answers1

3

I think CASE and SUM will solve it:

SELECT state
    , SUM(CASE WHEN R.revenue > 1000000 THEN 1 ELSE 0 END) AS companies_with_min_revenue
    , SUM(CASE WHEN R.employees > 10 THEN 1 ELSE 0 END) AS companies_with_min_employees
    FROM records R
    WHERE R.active_business = 1
    GROUP BY R.state

As you can see, we will have a value of 1 per record with a revenue of greater than 1000000 (else 0), then we'll take the sum. The same goes with the other column.

Thanks to this StackOverflow question. You'll find this when you search "sql conditional count" in google.

Community
  • 1
  • 1
KaeL
  • 3,639
  • 2
  • 28
  • 56