0

I have several txt files with entries containing different latitude and longitudinal coordinates as well as some other data. I want to input these data into separate sqlite tables, differentiated by their latitude and longitudinal coordinates. All in all there are 23 geographical zones that I want to give a table. However, I am a little unsure on how I best will do that. My idea so far is to read and input the data into one databasetable, then differentiate it into different sones:

 cur.execute("CREATE TABLE Total (unixtime int, mmsi int, latitude float, longitude float, port int)".format(site = site))
 cur.execute("CREATE INDEX latitude_index ON Total (latitude)")
 cur.execute("CREATE INDEX longitude_index ON Total (longitude)")

def writeToDatabase(databasepath, tablename, data):
    con = lite.connect(databasepath)
    cur = con.cursor()
    query = """
    INSERT INTO 
        {table} (unixtime, mmsi, latitude, longitude, port)
    values (?,?,?,?,?)
""".format(table=tablename)
    cur.executemany(query, (data))


if __name__ == "__main__":
    words = []
    port = []
    mmsi = []
    unixtime = []
    longitude = []
    latitude = []
    newdata = []
    databasepath = "SAISGlobal.db"
    createDatabaseTable(databasepath)
    for filename in os.listdir('data'):
        with open("data/"+filename) as f:   # Use file to refer to the file object
            data = f.readlines()
            for line in data:
                words = line.split(";")
                port.append(int(words[0]))
                mmsi.append(int(words[1]))
                unixtime.append(int(words[6]))
                longitude.append(float(words[2]))
                latitude.append(float(words[3]))
        newdata = zip(unixtime, mmsi, latitude, longitude, port)
        writeToDatabase("SAISGlobal.db", 'Total', newdata)

The idea is to then to extract the data from the Total table, and insert it to different tables for each zone using SQLite queries. However, there might be a better way to do this? Each file contains 200-300 mb of data, and there is 300-400 files.

Could I for example differentiate the data into tables before I insert into database? As it is 23 sones, I'm really unsure on how I would do this. If anyone got an idea?

Any other suggestions?

Edit:

Running time up to the writeToDatabase is 17.17 seconds for one file, the writeToDatabase takes 60-70 seconds for one file. If I remove the indexes I reduce the writeToDatabase to 9-20 seconds, however I need to write this data to other tables as well, and in the other tables I will need the other indexes.

bjornasm
  • 2,211
  • 7
  • 37
  • 62
  • 1
    You don't need to read all lines into memory before handling them; just use `for line in f:`. Do you really want to write only the last line of each file to the database? Why are you reopening the database every time? – CL. Jul 20 '15 at 12:41
  • @CL. Thank you for your comment. I reopen/write to database for each file is that I dont have enough RAM to hold all the values in memory before I enter them in database. I put the newdata assignment in the outer for loop now, thank you for noticing. – bjornasm Jul 20 '15 at 12:44
  • 1
    Profile your program. Which part is slow, parsing or writing? – CL. Jul 20 '15 at 12:58
  • @CL. Thank you. After profiling I've come to that running time up to calling the writeToDatabase is 17.17 seconds for one file, the writeToDatabase takes 60-70 seconds for one file. – bjornasm Jul 20 '15 at 13:08
  • 1
    If you don't care about a proper rollback when your program crashes, try changing [PRAGMA synchronous and PRAGMA journal_mode](http://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite). – CL. Jul 20 '15 at 13:14
  • Thank you, this saves some time. I see that for some reason the inserting process takes linearly longer time for each file, Is there some way to avoid this? I also still have the issue of differentiating into zones, if you have any pointers there please share. – bjornasm Jul 20 '15 at 13:32
  • I guess you have indexes. – CL. Jul 20 '15 at 13:34
  • @CL. Yes, I have updated my initial question with the create table. I think I sadly need indexes, especially on the zone specific tables. – bjornasm Jul 20 '15 at 13:36
  • 1
    But you don't need them while importing. – CL. Jul 20 '15 at 13:41
  • If I write to database for 10 files at the time (instead of just one at the time as the first code) I use approximately 13 seconds pr file, then a total of 210 seconds for the 10 files (21 seconds pr file) to write. Without indexes. – bjornasm Jul 20 '15 at 13:49

0 Answers0