4

I have a tab delimited file in the format:

sentenceID (sid)    documentID (scid)   sentenceText (sent)

E.g.

100004  100 即便您喜爱流连酒吧,也定然在这轻松安闲的一隅,来一场甜蜜沉醉的约会。
100005  100 您可以慢慢探究菜单上所有的秘密惊喜。

I want to put it into sqlite3 with the following schema:

CREATE TABLE sent (
    sid INTEGER PRIMARY KEY,
    scid INTEGER,
    sent TEXT,
    );

Is there a quick way to use the python API for sqlite (http://docs.python.org/2/library/sqlite3.html) to put them into a table?

I've been doing it as such:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sqlite3 as lite
import sys, codecs

con = lite.connect('mycorpus.db')

with con:    
    cur = con.cursor()
    cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")
    for line in codecs.read('corpus.tab','r','utf8'):
        sid,scid,sent = line.strip().split("\t")
        cur.execute("INSERT INTO Corpus VALUES("+sid+","+scid+"'"+sent+"')")
alvas
  • 115,346
  • 109
  • 446
  • 738

2 Answers2

3

Here's an example using unicodecsv module:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sqlite3

import unicodecsv


con = sqlite3.connect('mycorpus.db')
cur = con.cursor()
cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")

with open('corpus.tab', 'rb') as input_file:
    reader = unicodecsv.reader(input_file, delimiter="\t")
    data = [row for row in reader]

cur.executemany("INSERT INTO Corpus (sid, scid, sent) VALUES (?, ?, ?);", data)
con.commit()

Also see:

Hope that helps.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • quotes? I don't see quotes in your example. Anyway, you can pass appropriate `quotechar` to the reader object or set `quoting`. See [docs](http://docs.python.org/2/library/csv.html#csv.Dialect.quotechar). – alecxe May 15 '13 at 17:28
  • =) No worries your code works for input without `"` or `'`. Just that some of my other lines on the file has crazy quotation marks. – alvas May 16 '13 at 01:23
3
    #!/usr/bin/python
    # -*- coding: utf-8 -*-

    import sqlite3 as lite

    con = lite.connect('myCorpus.db')
    cur = con.cursor() 

    cur.execute("CREATE TABLE Corpus(sid INT, scid INT, sent TEXT, PRIMARY KEY (sid))")

    data=[row.split('\t') for row in file('myfile.tab','r').readlines()]
    cur.executemany("INSERT INTO Corpus (sid, scid,sent) VALUES (?, ?, ?);", data)

    con.commit()
alice88
  • 31
  • 1