5

Now I understand that this has been asked several times before, but I have tried to apply different existing solutions to my specific problems for quite a while without success. So I turn here in hope of some guidance.

I have a table called tblanswers, which contains answers linked to different questions in another table. What I want is to get the count for each answer for a specific question ID, but limit it to the n first answers each month.

Sample data from tblanswers:

id  qid answer  timestamp
72  162 2       1366027324
71  161 4       1343599200
70  162 2       1366014201
69  161 4       1366011700
68  162 2       1366006729
67  161 3       1366010948
66  162 2       1365951084

This is the query I have so far:

SELECT *, COUNT(*) c FROM(
    SELECT answer, timestamp, YEAR(FROM_UNIXTIME(timestamp)) yr, MONTH(FROM_UNIXTIME(timestamp)) mo FROM tblanswers
        WHERE qid = 161
            ORDER BY timestamp ASC
) q GROUP BY YEAR(FROM_UNIXTIME(timestamp)), MONTH(FROM_UNIXTIME(timestamp)), answer

That would give me something like this: (the dates and numbers in sample data is not accurate)

answer  yr      mo  c
1       2013    5   5
2       2013    5   3
3       2013    5   2
1       2013    6   5
2       2013    6   15
3       2013    6   7

Let's say I only want to see the first three answers in a month, then count could never be more than 3. How can I limit each month?

The final data should be a sum of each answer, like this:

answer num_answers
1      2
2      3
3      3

I think one of these solutions could work, but not how: http://code.openark.org/blog/mysql/sql-selecting-top-n-records-per-group http://code.openark.org/blog/mysql/sql-selecting-top-n-records-per-group-another-solution

Any help is appreciated. Thanks!

Tomas
  • 57,621
  • 49
  • 238
  • 373
Adam
  • 115
  • 8
  • 1
    If you're just getting a count, but not actually looking at the rows, can't you just use `LEAST(3, COUNT(*))` to cap it at 3? – Barmar Jun 28 '13 at 18:52
  • @Barmar Thanks, but I believe this solution has the same problem as SQL.injections below - it doesn't order the answers before capping them. I need the first n answers for each month. – Adam Jun 28 '13 at 22:13
  • If you're just getting a count, what difference does it make whether they're the first n, last n, or any random n? Is there more to the query than what you're showing? – Barmar Jun 28 '13 at 22:18
  • Well, if the sample data looked like this, ordered by date: 3,3,2,2,3,1,3,2,2 etc. and I want the count of each answer in the first three answers, wouldn't the order make a complete difference? – Adam Jun 28 '13 at 22:29
  • @Barmar In that case, the count of each answer of the first three would be: 1: 0 answers 2: 1 answers 3: 2 answers If I instead just sum all answers like this: 1: 1 answers 2: 4 answers 3: 4 answers and then capped it to three, the total would be 1: 1 answers 2: 3 answers 3: 3 answers which isn't the same. – Adam Jun 28 '13 at 22:32
  • I understand now. My confusion was because sometimes "answer" means a particular answer number, other times it means a row in `tblanswers`. Now I understand that you want to select the first three rows per question, and then get the count of answer IDs from them. – Barmar Jun 28 '13 at 22:37
  • What you need to do is use one of the **top-n-per-group** solutions in a subquery to get the first 3 rows by month, and then group that by `answer`. – Barmar Jun 28 '13 at 22:39
  • Yes, it was something like that I believed would be the correct solution. This is one of the most straight-forward top-n-per-group solutions I have found, still I don't even know where to begin translating it to my problem: http://www.zenddeveloper.com/get-top-n-rows-of-each-group-in-mysql/ – Adam Jun 28 '13 at 23:24
  • In this sql fiddle I pick the first 10 answers in may, 2013 and count them from sample data. How do I do this dynamically for each month? http://sqlfiddle.com/#!2/c4b62/5 – Adam Jun 29 '13 at 00:11

3 Answers3

0

This solution is based on the top-N-per-group method here

SELECT answer, COUNT(*) num_answers
FROM (SELECT answer, yearmonth,
             @rn := CASE WHEN @prevmonth = yearmonth
                         THEN @rn + 1
                         ELSE 1
                    END rn,
             @prevmonth := yearmonth
      FROM (SELECT @rn := NULL, @prevmonth := NULL) init,
           (SELECT answer,
                   YEAR(FROM_UNIXTIME(timestamp))*100+MONTH(FROM_UNIXTIME(timestamp)) yearmonth
            FROM tblanswers
            WHERE qid = 220
            ORDER BY timestamp) x) y

WHERE rn <= 3
GROUP BY answer

SQLFIDDLE

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thank you Barmar! However, the "WHERE rn <= 3" part didn't cap the number of answers per month, but the number of answer options. So I added a LIMIT x after "ORDER BY timestamp" instead, and I think it's working now. Does it seem correct? http://sqlfiddle.com/#!2/c4b62/62 – Adam Jun 29 '13 at 16:35
  • Ok, your solution does indeed seem to be working, but only sometimes. Sometimes it returns the sum of all answers even though it should never be able to return more than n*(number of months) answers. It seems that it works if I run the same query twice, until I change the cap to another value. Any ideas why? – Adam Jun 29 '13 at 17:06
  • I've modified the query to include a subquery that initializes `@rn` and `@prevmonth`. This wasn't needed at sqlfiddle because each run is in its own SQL session, but if you do multiple queries in a session these variables will retain their values. – Barmar Jun 30 '13 at 00:25
  • Thanks Barmar, that still didn't work. However, this did: http://sqlfiddle.com/#!2/c4b62/70 – Adam Jun 30 '13 at 15:44
  • I changed the order of the subqueries. This is now the order from the blog post, so it's probably significant. Can you see if this works for you? – Barmar Jun 30 '13 at 17:27
0

What about this solution:

SELECT qid, answer, YEAR(FROM_UNIXTIME(timestamp)) yr, MONTH(FROM_UNIXTIME(timestamp)) mo, COUNT(*) no
FROM tblanswers
WHERE qid = 161
GROUP BY answer, yr, mo
HAVING COUNT(*) <= 2
ORDER BY timestamp ASC;

and the fiddle: http://sqlfiddle.com/#!2/1541eb/126

Gimmy
  • 3,781
  • 2
  • 18
  • 27
-2

There is no reason to reinvent a wheel and risk you have a buggy, suboptimal code. Your problem is trivial extension of common per group limit problem (see also tag ). There are already tested and optimized solutions to solve this problem.

Community
  • 1
  • 1
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • 1
    Thanks Tomas, as I stated in my first post I have already looked at those posts and tried to apply it to my problem. I am certain that it's the right way to go, but I'll need some guidance as how to solve my problem using that solution. – Adam Jun 28 '13 at 22:49