1

I have a table with ID (IP address) and a factor variable (Web browser) and I need to make another table that has a single record for each ID, together with the mode of the factor variable. I was thinking of something like SELECT ip, MODE (browser) FROM log GROUP BY ip.

Unfortunately, SQLite doesn't implement a MODE function, so this doesn't work. I thought of building a temporary table with the counts of each browser and then using a SELECT DISTINCT ON or a RANK () statement but SQLite doesn't support these either.

Additionally, it would be nice to this in a single statement because there are several other factors whose mode I also need (and are also grouped by the same ID).

wizplum
  • 427
  • 5
  • 17

2 Answers2

2

To compute the mode, group by the browser column, get the COUNT(*) for each group, sort by that value, and take the record with the largest value.

If you already have another GROUP BY, use a correlated subquery:

SELECT ip,
       (SELECT browser
        FROM log AS log2
        WHERE ip = ips.ip
        GROUP BY browser
        ORDER BY COUNT(*) DESC
        LIMIT 1)
FROM (SELECT DISTINCT ip
      FROM log) AS ips
CL.
  • 173,858
  • 17
  • 217
  • 259
0

There is a log table with timestamp, label, and latency. We want to see the latency (sending time: ST) MODE ( módusz ) value of each label, grouped by timestamp. The MODE of a set of data values is the value that appears most often.

select L, T, avg( ST ) as MODEST, C
from (
    select L, T, ST, count( ST ) as C
    from (
            select label as L, 
                         substr( substr( timeStamp, 0, 8) || '00000000', 0, 14 ) as T, 
                         latency as ST 
            from LOG 
            order by L, T, ST 
            ) as XX 
    group by L, T, ST 
) as YY
where L || '#' || T || '#' || C in (  select L || '#' || T || '#' || max(C)
                                      from(
                                        select L, T, count( ST ) as C
                                        from (
                                          select label as L, 
                                          substr( substr( timeStamp, 0, 8) || '00000000', 0, 14 ) as T, 
                                          latency as ST
                                          from LOG 
                                        ) as XX 
                                        group by L, T, ST ) as YY
                                      group by L, T )
group by L, T, C