1

I've been trying to do some CSV file merging using SQL in Python with SQL query being the following:

WITH
MATCHES AS( -- get all matches
    SELECT      CSV2.*
                , CSV1.ROW as ROW_1                 
                , CSV1.C4 as C4_1
                , CSV1.C5 as C5_1
    FROM        CSV2 
    LEFT JOIN   CSV1 
    ON          CSV1.C4 LIKE '%' || CSV2.C2 || '%'    
),
EXACT AS( -- matches where CSV1.C4 = CSV1.C5
    SELECT      *
    FROM        MATCHES
    WHERE       C4_1 = C5_1
),
MIN_ROW AS( -- CSV1.ROW of first occurence for each CSV2.C1
    SELECT      C1
                , min(ROW_1) as ROW_1
    FROM        MATCHES
    WHERE       C1 NOT IN (SELECT C1 FROM EXACT)
    GROUP BY    C1, C2, C3, C4, C5                  
)
-- use C4=C5 first
SELECT      *
FROM        EXACT
UNION
-- if match not in exact, use first occurence
SELECT      MATCHES.*
FROM        MIN_ROW
INNER JOIN  MATCHES
ON          MIN_ROW.C1 = MATCHES.C1
AND         (MIN_ROW.ROW_1 = MATCHES.ROW_1 OR MIN_ROW.ROW_1 IS NULL)
ORDER BY    C1

However, I keep receiving OperationalError: near "WITH": syntax error. I'm not very much familiar with SQL and haven't attempted to use it in Python before. What can be done to fix this error? Update1- The query within the first pair of brackets works fine without "with match as" part. Here are sample csv files: CSV1

data13      data23      d      main_data1;main_data2      data13         data23
data12      data22      d      main_data1;main_data2      data12         data22
data11      data21      d      main_data1;main_data2      data11         data21
data3       data4       d      main_data2;main_data4      data3          data4
data52      data62      d      main_data3                 data51         data62
data51      data61      d      main_data3                 main_data3     data61
data7       data8       d      main_data4                 data7          data8

CSV2

id1      main_data1      a1      a2      a3
id2      main_data2      b1      b2      b3
id3      main_data3      c1      c2      c3
id4      main_data4      d1      d2      d3
id5      main_data5      e1      e2      e3

and the Python code

import csv
import sqlite3

def createTable(cursor, rows, tablename):
    tableCreated = False
    for row in rows:
        if not tableCreated:
            sql = "CREATE TABLE %s(ROW INTEGER PRIMARY KEY, " + ", ".join(["c%d" % (i+1) for i in range(len(row))]) + ")"
            cur.execute(sql % tablename)
            tableCreated = True
        sql = "INSERT INTO %s VALUES(NULL, " + ", ".join(["'" + c + "'" for c in row]) + ")"
        cur.execute(sql % tablename)
    conn.commit()


conn = sqlite3.connect(":memory:")
cur = conn.cursor()

for filename, tablename in [(path_to_csv1, "CSV1"), (path_to_csv2, "CSV2")]:
    with open(filename, "r") as f:
        reader = csv.reader(f, delimiter=',')        
        rows = [row for row in reader]
    createTable(cur, rows, tablename)

Update2- I'm using Python 2.7

CL.
  • 173,858
  • 17
  • 217
  • 259
abn
  • 1,353
  • 4
  • 28
  • 58

2 Answers2

1

The WITH clause is supported since SQLite 3.8.3. The version shipped with Python 2.7 is much older.

Update to the latest Python, or use another database driver like apsw, or, if Python is compiled with dynamic libraries, try to replace the SQLite library file.

CL.
  • 173,858
  • 17
  • 217
  • 259
  • +1: Bumping sqlite to 3.21.0 solved this for me. -1: sqlite is not part of Python. See https://stackoverflow.com/a/1553190/2166224 . – kidmose Jan 12 '18 at 17:05
-1

Add ';' before With as

;WITH
MATCHES AS( ...
Deepshikha
  • 9,896
  • 2
  • 21
  • 21