0

Table

I have this table below my goal is to take the average of columns perDem,perGop,perInd but only the last three based on the day.How can I write such a query I tried to use order by,group by but am not able to achieve the results. I need an avg of each column but only the last three entries based on the day. so for Alabama for example average perDem,perGop,PerInd but only three rows which are last if ordered by day. This is what I have so far but I need this to only take the avg of the last three based on the day submitted for that particular state.

select polls.state,evotes,avg(perDem),avg(perGOP),avg(perInd)
             from polls,electoral
             where electoral.state=polls.state
             group by electoral.state,polls.state";
  • You have only listed three columns, but your comment `but only the last three based on the day` suggests there should be more. Please clarify. – toonice Apr 18 '17 at 03:48
  • Do you mean you are interested in the three most recent entries for each day? – toonice Apr 18 '17 at 03:52
  • yes exactly thats what I mean. – CanyoucSharp Apr 18 '17 at 03:53
  • How do you determine the order in which they occur (so that we can determine which are the most recent)? – toonice Apr 18 '17 at 04:01
  • this is a query that I have at the moment select polls.state,evotes,avg(perDem),avg(perGOP),avg(perInd) from polls,electoral where electoral.state=polls.state group by electoral.state,polls.state"; which just takes the average of all of the polls but I need the last three – CanyoucSharp Apr 18 '17 at 04:03
  • To improve the quality of your question, please mention what research you have performed, especially any sources that have proved particularly helpful (with links). Please show the structure of your table(s) with sample data as text in the question rather than as a link to an image. This is because having it in the text of the Question is more convenient for those reading your Question, and because links can break. Please show what code you are currently trying along with the full text of any error messages it produces. – toonice Apr 18 '17 at 04:05
  • How do we determine which are `the last three`? – toonice Apr 18 '17 at 04:07
  • the last three if ordered by the day as you said the most recent three so if alabama has 8 entries with day 1,20,40,60,100,200,250,270 I will only take the avg of the entries 270,250,200 – CanyoucSharp Apr 18 '17 at 04:08
  • If we `ORDER BY day` there is no guarantee that the top or bottom three will be the three most recent. Changing `day ` to a `DATE` or `DATETIME` field would alleviate this problem *if* the `TIME` component were an accurate reflection of the order in which they should appear in a sorted list. You could also use a field for the day number / date and a field for the time or a field that indicates where it should appear. If you are only interested in the order in which they are entered then you should add a 'pollsID` field and use an `AUTO_NUMBER`. Is field we can use for subsorting currently? – toonice Apr 18 '17 at 04:15
  • Oh, so you mean the average over all days rather than for each day? Also, if the most recent day has more than 3 entries (possible, maybe even likely) how do you choose which rows? – toonice Apr 18 '17 at 04:17
  • yes exactly you mean if 2 days are = chose either 1 – CanyoucSharp Apr 18 '17 at 04:19
  • Will do. For future reference, please consider a field that can be used for subsorting. – toonice Apr 18 '17 at 04:20
  • Database wasn't created by me – CanyoucSharp Apr 18 '17 at 04:26
  • Please try my Answer – toonice Apr 18 '17 at 04:32
  • If a state has no stats (for some reason) will you still want to list it? Or can we assume that this case will never happen? – toonice Apr 18 '17 at 05:45

2 Answers2

0

Please try the following...

set @rowNum := 0;
set @state := '';

SELECT state,
       AVG( perDem ) AS avgPerDem,
       AVG( perGOP ) AS avgPerGOP,
       AVG( perInd ) AS avgPerInd
FROM
(
    SELECT day AS day,
           state AS state,
           perDem AS perDem,
           perGOP AS perGOP,
           perInd AS perInd,
           @rowNum := if ( @state = state,
                           @rowNum + 1,
                           1 ) AS rowNum,
           @state := state AS valueHolder
    FROM polls
    ORDER BY state,
             day DESC
) rowNumGenerator
WHERE rowNum <= 3
GROUP BY state
ORDER BY state;

Here the inner query returns sorts the contents sorts the contents of polls by state in alphabetical order and for each state it further sorts the rows by day, from largest to smallest.

Once this is done it returns the fields of interest (day, state, perDem, perGOP and perInd) as well as a row number that it generates based on the following pattern...

Where a new state is encountered it gives this first row a row number of 1.

Each subsequent row for that state is given the next available row number.

This effectively assigns a number to each record's position within the ordered list, relative to the first record for that state.

The outer query selects only those rows whose position / row number places it in the top three for its state.

The records of the resulting dataset are then grouped by state. Please note that the outer SELECT statement does not know the grouping used by the inner one, and can not safely assume it. Thus it will assume that there is no grouping unless otherwise instructed.

The mean average for each state's per fields are then calculated and all specified fields are returned to the user.

If you have any questions or comments, then please feel free to post a Comment accordingly.

toonice
  • 2,211
  • 1
  • 13
  • 20
  • Redundant line removed. – toonice Apr 18 '17 at 04:29
  • Added a missing line. – toonice Apr 18 '17 at 04:31
  • LIMIT 0, 1000 Error Code: 1055. Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'voting.polls.perDem' which is not functionally dependent on column – CanyoucSharp Apr 18 '17 at 04:37
  • Hmm... I'll look into that. – toonice Apr 18 '17 at 04:38
  • I have been also having the same issue with group by :( really appreciate the help.I really need to read a database book but short on time at the moment. – CanyoucSharp Apr 18 '17 at 04:38
  • No worries. I have spotted at least one problem with my code. It will take me a few minutes to adapt. Good time to get a cuppa, if you are so inclined. – toonice Apr 18 '17 at 04:42
  • Hey i exported the entire database if you would like it for testing https://www.justbeamit.com/p2w9q – CanyoucSharp Apr 18 '17 at 04:48
  • Been having some pain, but on the right track here. – toonice Apr 18 '17 at 05:21
  • Next attempt posted. – toonice Apr 18 '17 at 05:43
  • Will try need a minute – CanyoucSharp Apr 18 '17 at 05:57
  • got this error LIMIT 0, 1000 Error Code: 1267. Illegal mix of collations (utf8_general_ci,IMPLICIT) and (utf8_unicode_ci,IMPLICIT) for operation '=' 0.000 sec – CanyoucSharp Apr 18 '17 at 06:08
  • I have posted my last attempt. If it is not too awkward for you, would you mind testing it? – toonice Apr 18 '17 at 06:13
  • Yes will try few mins – CanyoucSharp Apr 18 '17 at 06:30
  • I have created a sample table and populated it with the visible data in your picture. I then tested my code against this table and it worked fine. I then plonked `http://stackoverflow.com/questions/1241856/illegal-mix-of-collations-error-in-mysql` into google and found http://stackoverflow.com/questions/1241856/illegal-mix-of-collations-error-in-mysql. `Dean Rather`'s answer seems to be the most helpful. I thought you might find this useful for future reference. I shall now get to typing up my explanation. – toonice Apr 18 '17 at 08:33
  • Thank you so much post was rather usefull~ – CanyoucSharp Apr 18 '17 at 19:55
  • You're welcome - glad to be of help. Please note that if you feel an Answer was also significantly useful in addition to the one accepted, it is encouraged that you upvote the Answer. This gives a small reward of 10 points to the answerer and indicates to anyone reviewing the Question and its Answers that this Answer is more noteworthy than an Answer that has not received an upvote. – toonice Apr 18 '17 at 23:38
0

One possible approach is to make use of MySQL user-defined variables as a way to emulate analytic/windowing functions available in other databases.

SELECT v.state
     , e.evotes
     , AVG(IF(v.n<=3,v.perDem,NULL)) AS perDem_avg_last_3_day
     , AVG(IF(v.n<=3,v.perGOP,NULL)) AS perGOP_avg_last_3_day
     , AVG(IF(v.n<=3,v.perInd,NULL)) AS perInd_avg_last_3_day
  FROM ( SELECT @i := IF(p.state = @p_state,@i+1,1) AS n
              , @p_state := p.state AS `state`
              , p.perDem
              , p.perGOP
              , p.perInd
           FROM polls p
          CROSS
           JOIN ( SELECT @p_state = '', @i := 0 ) i
          ORDER
             BY p.state DESC
              , p.day DESC
       ) v
  JOIN electoral e
    ON e.state = v.state
 GROUP BY v.state, e.evotes
 ORDER BY v.state

NOTE: The MySQL Reference manual specifically warns against this type of usage of user-defined variables; but up through MySQL 5.6, we observe consisted behavior with carefully constructed SELECT statements.)

From a performance standpoint, this approach is suboptimal for large sets (materializing the inline view, and likely requiring a "Using filesort" operation.)

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • I get this Error Code: 1054. Unknown column 'p.state' in 'on clause' – CanyoucSharp Apr 18 '17 at 05:18
  • change that to `ON e.state = v.state`. – spencer7593 Apr 18 '17 at 05:19
  • changing that then resulted in this error LIMIT 0, 1000 Error Code: 1055. Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'voting.e.evotes' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by 0.000 sec – CanyoucSharp Apr 18 '17 at 05:21
  • so just add `e.evotes` to the GROUP BY clause. Or remove `e.evotes` from the SELECT list. Or change `sql_mode` to a setting that doesn't include ONLY_FULL_GROUP_BY. – spencer7593 Apr 18 '17 at 05:34