2

in a S3 bucket I have stored a file.db for SQLite.

What I want is to make the connection to the database using that file. Something like this:

import boto3
import pandas
import sqlite3

s3 = boto3.resource('s3')

cnx = sqliite3.connect('s3://bucket/file.db')
df = pd.read_sql_table('select * from table', cnx)

Is it possible to somehow read the file from s3 with python/pandas? I understand that the tricky thing is to bring the file from s3 as an absolute path and not as an object.

I look forward to your comments and they would help me a lot

Juan Almada
  • 129
  • 7

1 Answers1

0

There are a couple of different ways you can approach this problem.

Approach #1

The first way would be to download the sqlite database to a local filesystem, and then open it with sqlite. An example of how to download a file from S3 can be found here. You can use that to download the file to the current working directory, then use

import sqlite3
con = sqlite3.connect("file.db")

to open the file.

However, any changes you make to this file will not be reflected in the copy on S3.

Approach #2

The second approach would be to use Litestream. Litestream is a tool which replicates changes back to S3, by writing the WAL to S3. The WAL is a record of what changed in the file, and you can recover the current state of the database using only the WAL and an old copy of the SQLite database. Every so often, it compacts the WAL by writing a new copy of the SQLite database, then deleting the old database and the WAL.

They have a tutorial which explains how to use it.

Limitations

What if you wanted to have a SQLite database in S3, and have multiple clients which read and write to this database?

This is not possible. You need a database which supports multi-user concurrency, such as PostgreSQL or MySQL, to do this. SQLite and S3 cannot accomplish this.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66