10

Using SQLite, I'd like to split a string in the following way.

Input string:

C:\Users\fidel\Desktop\Temp

and have the query return these rows:

C:\
C:\Users\
C:\Users\fidel\
C:\Users\fidel\Desktop\
C:\Users\fidel\Desktop\Temp

In other words, I'd like to split a file path into its constituent paths. Is there a way to do this in pure SQLite?

Fidel
  • 7,027
  • 11
  • 57
  • 81

6 Answers6

12

This is possible with a recursive common table expression:

WITH RECURSIVE split(s, last, rest) AS (
  VALUES('', '', 'C:\Users\fidel\Desktop\Temp')
  UNION ALL
  SELECT s || substr(rest, 1, 1),
         substr(rest, 1, 1),
         substr(rest, 2)
  FROM split
  WHERE rest <> ''
)
SELECT s
FROM split
WHERE rest = ''
   OR last = '\';

(You did not ask for a reasonable way.)

CL.
  • 173,858
  • 17
  • 217
  • 259
  • I don't know much about using CTE's (at least in this fashion). How reliable/performant is it? That is, is it an implementation robust enough that could, with more tweaking, eventually make its way into a standard SQLite library function? Lack of SPLIT is probably the most painful day-to-day missing feature in Sqlite for me. – dancow Aug 23 '16 at 18:33
  • 2
    All of the answer so far are hardcoding the input string. Can someone please show how to make one of these solution generic so that, for instance, it is possible to split the `blah` column within a larger query? – Craig Silver Aug 28 '20 at 16:18
  • @CraigSilver That would be a different question, and you have to specify how exactly the output should look like. – CL. Aug 29 '20 at 06:09
  • @CraigSilver instead of the `VALUES` part you can just put another `SELECT`, e.g. `SELECT '', '', blah FROM someTable` – Adam Burley Jun 06 '22 at 18:48
10

Recursive CTE:

WITH RECURSIVE cte(org, part, rest, pos) AS (
  VALUES('C:\Users\fidel\Desktop\Temp', '','C:\Users\fidel\Desktop\Temp'|| '\', 0)
  UNION ALL
  SELECT org,
         SUBSTR(org,1, pos + INSTR(rest, '\')),
         SUBSTR(rest, INSTR(rest, '\')+1),
         pos + INSTR(rest, '\')
  FROM cte
  WHERE INSTR(rest, '\') > 0                         
)
SELECT *
FROM cte
WHERE pos <> 0
ORDER BY pos; 

SqlFiddleDemo

Output:

╔═════════════════════════════╗
║            part             ║
╠═════════════════════════════╣
║ C:\                         ║
║ C:\Users\                   ║
║ C:\Users\fidel\             ║
║ C:\Users\fidel\Desktop\     ║
║ C:\Users\fidel\Desktop\Temp ║
╚═════════════════════════════╝

How it works:

org  - original string does not change
part - simply `LEFT` equivalent of original string taking pos number of chars
rest - simply `RIGHT` equivalent, rest of org string
pos  - position of first `\` in the rest 

Trace:

╔══════════════════════════════╦══════════════════════════════╦════════════════════════════╦═════╗
║             org              ║            part              ║           rest             ║ pos ║
╠══════════════════════════════╬══════════════════════════════╬════════════════════════════╬═════╣
║ C:\Users\fidel\Desktop\Temp  ║ C:\                          ║ Users\fidel\Desktop\Temp\  ║   3 ║
║ C:\Users\fidel\Desktop\Temp  ║ C:\Users\                    ║ fidel\Desktop\Temp\        ║   9 ║
║ C:\Users\fidel\Desktop\Temp  ║ C:\Users\fidel\              ║ Desktop\Temp\              ║  15 ║
║ C:\Users\fidel\Desktop\Temp  ║ C:\Users\fidel\Desktop\      ║ Temp\                      ║  23 ║
║ C:\Users\fidel\Desktop\Temp  ║ C:\Users\fidel\Desktop\Temp  ║                            ║  28 ║
╚══════════════════════════════╩══════════════════════════════╩════════════════════════════╩═════╝
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
7

If you want to search for the values ​​individually, use the code below:

WITH RECURSIVE split(content, last, rest) AS (
VALUES('', '', 'value1§value2§value3§value4§value5§value6§value7')
UNION ALL
  SELECT 

    CASE WHEN last = '§' 
            THEN
                substr(rest, 1, 1)
            ELSE
                content || substr(rest, 1, 1)
    END,
     substr(rest, 1, 1),
     substr(rest, 2)
  FROM split
  WHERE rest <> ''
)
SELECT 
       REPLACE(content, '§','') AS 'ValueSplit'     
FROM 
       split
WHERE 
       last = '§' OR rest ='';

Result:

**ValueSplit**

value1
value2
value3
value4
value5
value6
value7

I hope I can help people with the same problem.

Bruno Matos
  • 71
  • 1
  • 1
3

There's simpler alternative to the recursive CTE, that also can be applied to a number of file paths in a result set (or generally any delimited strings that you want to "split" into multiple rows by a separator).

SQLite has JSON1 extension. It's compatible with SQLite >= 3.9.0 (2015-10-14), but sqlite3 is almost always compiled with it now (e.g. Ubuntu, Debian, official Python Docker images and so on, and you can check it with PRAGMA compile_options and this answer has a little more detail on it).

JSON1 has json_each, which is one of the two table-valued functions in the extension that:

walk the JSON value provided as their first argument and return one row for each element.

Hence if you can turn your string into a JSON array string, this function will do the rest. And it's not hard to do.

const sql = `
  WITH input(filename) AS (
    VALUES  
      ('/etc/redis/redis.conf'),
      ('/run/redis/redis-server.pid'),
      ('/var/log/redis-server.log')
  ), tmp AS (
    SELECT 
      filename, 
      '["' || replace(filename, '/', '", "') || '"]' as filename_array
    FROM input
  )
  SELECT (
    SELECT group_concat(ip.value, '/') 
    FROM json_each(filename_array) ip 
    WHERE ip.id <= p.id
  ) AS path
  FROM tmp, json_each(filename_array) AS p
  WHERE p.id > 1  -- because the filenames start with the separator
`

async function run() {
  const wasmUrl = 'https://cdnjs.cloudflare.com/ajax/libs/sql.js/1.5.0/sql-wasm.wasm'
  const sqljs = await window.initSqlJs({locateFile: file => wasmUrl})
  const db = new sqljs.Database()
  const results = db.exec(sql) 
  ko.applyBindings(results[0])
}
run()
<script src="https://cdnjs.cloudflare.com/ajax/libs/knockout/3.4.2/knockout-min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/sql.js/1.5.0/sql-wasm.min.js"></script>
<table>
  <thead>
    <tr data-bind="foreach: columns"><th data-bind="text: $data"></th></tr>
  </thead>
  <tbody data-bind="foreach: values">
    <tr data-bind="foreach: $data"><td data-bind="text: $data"></td></tr>
  </tbody>
</table>
saaj
  • 23,253
  • 3
  • 104
  • 105
1

Inspired from Lukasz Szozda's answer:

 WITH RECURSIVE cte("pre","post") AS (   
   VALUES('C:', 'Users\fidel\Desktop\Temp' || '\')
 UNION ALL
   SELECT "pre" || '\' || left("post",     position('\' in "post")-1),
                     substring("post" from position('\' in "post")+1)
   FROM cte   
   WHERE "post" > ''                             
 ) 
 SELECT "pre" FROM cte

(tested on PostgreSQL)

The idea is now to replace the VALUES line

VALUES('C:', 'Users\fidel\Desktop\Temp' || '\')

with placeholders like

 VALUES(?, ? || '\')

which have been pre-split in the programming language that is going to run the SQL statement above against the data base.

Reading the SQLite docs, I see that substring(... from ...) has to be replaced by substr(..., ...) and position(... in ...) is to be replaced by instr(..., ...) with parameters swapped.

Very annoying for me since I wanted SQL code that runs on both PostgreSQL and SQLite.

stonux
  • 11
  • 1
1

Simple split using json_each from JSON1:

create table demo as select 'Split,comma,separated,string,into,rows' as DemoString;
select row_number() over () as Part, parts.value as Splitted 
    from demo, json_each('["'||replace(demo.DemoString,',','","')||'"]') parts;
Jim Beem
  • 21
  • 3
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 10 '22 at 16:36