1

I have a database, db and in it a table, Table.

It looks somewhat like:

id | val
--------
1  | 45
2  | 35
3  | 23
4  | 49
5  | 67
6  | 12
7  | 0
8  | 87
9  | 46

(This is just an example data set. Actual data set is huge. And I need to work in least time possible.)

I need to find the median of the column val. Actually I need a php function to be used multiple times.

A similar question does exist: Simple way to calculate median with MySQL

I tried a few answers in this question, none of them worked for me. The accepted answer doesn't work since it used to work with an older version of SQL only.

PS: It should also work in the case of many duplicates.

Community
  • 1
  • 1
Ranveer
  • 6,683
  • 8
  • 45
  • 92
  • If you just wanna do it in php, just pull out all the values, put into an array, sort the array, find median. – user602525 Dec 14 '13 at 06:44
  • Won't that be slow? Sorting and then finding the mid value? – Ranveer Dec 14 '13 at 06:45
  • Well that depends on the data set for sure. I'm just going based on what's up there right now. – user602525 Dec 14 '13 at 06:46
  • This was just an example, I actually have a huge data set. I'll modify the question. Thanks! – Ranveer Dec 14 '13 at 06:46
  • This is just a thought, but could you first count the size of data via sql. Then query for the data (sorted), but select 1/2 of the count rows, so you're at the mid point, then just last row of data? – user602525 Dec 14 '13 at 06:55
  • Some thoughts on algorithms: http://stackoverflow.com/questions/4201292/on-algorithm-to-find-the-median-of-a-collection-of-numbers – Jorge Campos Dec 14 '13 at 06:56
  • And this one: http://cs.stackexchange.com/questions/1914/to-find-the-median-of-an-unsorted-array – Jorge Campos Dec 14 '13 at 06:56

1 Answers1

-1

just for fun i thought i try and do it all in MySQL, here's the sqlFiddle

SELECT 
  CASE 
  WHEN MOD((select count(*) as count from t),2)=1 THEN
      (select val from
           (select @row:=@row+1 as row,val
            from t,(select @row:=0)r
            order by val)t1
       where t1.row = CEIL((select count(*) as count from t)/2)
      )
  ELSE
       ((select val from
           (select @row:=@row+1 as row,val
            from t,(select @row:=0)r
            order by val)t1
         where t1.row = (select count(*) as count from t)/2)+
        (select val from
           (select @row:=@row+1 as row,val
            from t,(select @row:=0)r
            order by val)t1
         where t1.row = ((select count(*) as count from t)/2)+1))/2 
  END AS median

Just replace occurences of t with your table name, don't change t1. Also if the table has no rows, it'll return NULL as median.

This query can be further reduced to the below (sqlFiddle)

SELECT @rowCount:=(select count(*) as count from t) AS rowcount,
      (select AVG(val) from
           (select @row:=@row+1 as row,val
            from t,(select @row:=0)r
            order by val)t1
       where t1.row IN (FLOOR((@rowCount+1)/2),
                         CEIL((@rowCount+1)/2)
                        )
      ) as Median

It'll return 2 columns, a rowcount column and a median column. I put the rowcount column there because i didn't want to count from t multiple times like previous query.

Tin Tran
  • 6,194
  • 3
  • 19
  • 34