SQLITE transpose large number of rows into columns

Question

I have a database ordered like so:

    ID     |    Subject     |      Value
 ---------------------------------------------
     1            Subj1            Val1
     1            Subj2            Val2
     2            Subj1            Val3
     2            Subj5            Val4

And so on. IDs number in the thousands, and the subjects in tens of thousands.

I want to find the transpose of this matrix. How do I do that?

If the subjects were a small, static set, then I could use CASE statements as in this solution: SQLITE - transposing rows into columns properly, but in my situation, subjects are a large, dynamic set, so CASE won't work unless I'm building the SQL dynamically in some application outside the database. In my case, there is no "application"; as I mention in a comment below, I'm looking for a pure-SQL solution.

Here's the kicker: I'm using SQLite, and it's missing the PIVOT statement that would make this easier. I don't know if there's still a way to do this, and haven't been taught much past types of joins. For a smaller database I would have tried a series of left joins, but since I have so many column values in my result I don't know how.

How do I convert to this form?

      ID      |     Subj1       |        Subj2      |    Subj3     |       etc.
   --------------------------------------------------------------------
      1              Val1                  Val2            0        
      2              Val3                   0              0

This is a valid question -- without PIVOT, sqlite makes this hard when the number of resulting columns is dynamic. For how this works in another db engine, see https://stackoverflow.com/a/27532568/1264797 — stevegt, Jun 30 '17 at 20:39

saaj · Answer 1 · 2021-05-16T17:49:31.473

SQLite (at least as of 3.31) out-of-the-box doesn't have PIVOT, CROSSTAB or any other similar functionality. More generally it doesn't have a means to produce columns dynamically (and the resort is manual creation of CASE expressions to define columns). But there's pivot_vtab virtual table extension which implements a pivot table functionality. I guess it's sort of proof-of-concept implementation, but still can be useful depending on the circumstances.

Here's the long description from the repo (redacted so it fits a snippet):

CREATE VIRTUAL TABLE pivot USING pivot_vtab(
  --
  -- Pivot table row key query. Defines first column of the pivot table.
  -- [...]
  -- The first column name in this query will become the name of the pivot table key 
  -- column. The value of the [...] key column is provided to the pivot query as ?1.
  --
 (SELECT id r_id -- pivot table key
    FROM r), 
  --
  -- Pivot table column definition query. Defines second+ column(s) of the pivot
  -- table. This query should return pivot table column key/name pairs.
  -- [...]
  -- The first column of this query is the pivot column key, and is provided
  -- to the pivot query as ?2. The second column of this query is used to name the  
  -- pivot table columns. This column is required to return unique values.
  -- 
  -- Changes to this query can only be propagated by dropping and 
  -- re-creating the virtual table
  --
 (SELECT id c_id,   -- pivot column key - can be referenced in pivot query as ?2
         name       -- pivot column name
    FROM c),    
  --
  -- Pivot query. This query should define a single value in the pivot table when
  -- filtered by the pivot table row key (1?) and a column key (2?)
  --
 (SELECT val FROM x WHERE r_id = ?1 AND c_id = ?2)
);

While I was playing with it I had a couple of issues:

with invalid CREATE VIRTUAL TABLE ... it can crash with segmentation fault (e.g. on missing table)
the only error message you get out of it is vtable constructor failed: to_be_table
it expects separate table with unique pivot column id-name pairs (were character case makes a difference), and wouldn't create a pivot otherwise

I'll use this GitHub Gist with a CSV with salary data.

Here I prepare the data. Import the CSV into :memory: SQLite3 database, load the extension and clean the data (I load the records of software engineers at FAANG into a temporary table).

.mode csv 
.import salaries.csv salary_import

.load ./pivot_vtab
.headers on
.mode column

CREATE TABLE temp.salary AS
WITH clean AS (
  SELECT 
    Employer employer, 
    lower(trim("Job Title")) title, 
    replace("Annual Base Pay", ',', '') base_pay
  FROM salary_import
)
SELECT employer, title, round(avg(base_pay)) avg_base_pay, COUNT(*) count
FROM clean
WHERE
  employer IN ('Facebook', 'Amazon', 'Apple', 'Netflix', 'Google')
  AND title LIKE '%software engineer%'
GROUP BY 1, 2;

And here it's actual pivot creation (it's a temporary table in my example, but it can be persistent just as well).

CREATE VIRTUAL TABLE temp.pivot USING pivot_vtab(
  (SELECT employer FROM temp.salary GROUP BY employer),
  (SELECT title, title FROM temp.salary GROUP BY title),   
  (
    SELECT avg_base_pay
    FROM temp.salary
    WHERE employer = ?1 AND title = ?2
  )
);
SELECT * FROM temp.pivot;

To run it, save these files in a directory:

salary.sql (the two snippets above combined)
salaries.csv (the CSV from the Gist)
pivot_vtab.c (the code of the extension)

Then run it like this (chmod o+w . if you run user-namespaced Docker):

$ docker run --rm -it -v $PWD:/tmp/build -w /tmp/build ubuntu:focal
# apt update
# apt install -y --no-install-recommends gcc sqlite3 libsqlite3-dev
# sqlite3 --version
3.31.1 2020-01-27 19:55:54 3bfa9cc...
# gcc -g -O3 -fPIC -shared pivot_vtab.c -o pivot_vtab.so
# cat salary.sql | sqlite3 --bail
employer    data engineer/software engineer  senior software engineer  software eng...
----------  -------------------------------  ------------------------  ------------...
Amazon                                                                 130000.0    ...
Apple       145000.0                         155500.0                  122667.0    ...
Facebook                                     182000.0                  166690.0    ...
Google                                       158267.0                  131465.0    ...
Netflix                                      340000.0                              ...

Note that as of commit [49d96600](https://github.com/jakethaw/pivot_vtab/commit/49d96600) the 3 issues I mentioned above have been address by the author of `pivot_vtab`. — saaj, Jun 14 '21 at 20:52

Gurwinder Singh · Answer 2 · 2017-03-27T04:00:26.083

1

You can use conditional aggregation for pivoting your data:

select id,
    max(case when Subject = 'Subj1' then Value end) as Subj1,
    max(case when Subject = 'Subj2' then Value end) as Subj2,
    max(case when Subject = 'Subj3' then Value end) as Subj3,
    . . . 
from your_table
group by id;

Note here that if there are multiple rows with same id and same Subject, only one row with max value will be returned for the id.

edited Mar 27 '17 at 04:00

answered Mar 27 '17 at 03:58

Gurwinder Singh

38,557
6
51
76

I understand the idea: but how do I write cases when I have 20000-ish values for subject? Is there some query that lets me aggregate? – tvishwa107 Mar 27 '17 at 04:00
@tvishwa107 - It's odd to have *20000* subjects. Anyhow, If you have too many subjects, then consider handling this in your application code. – Gurwinder Singh Mar 27 '17 at 04:02
Can't - this is a database question, there's no application involved. – tvishwa107 Mar 27 '17 at 04:08

SQLITE transpose large number of rows into columns

2 Answers2

Linked