0

I am working with Sqlite database using sqlalchamey. It is quite a large db and is slowing down my operations when loading it multiple times i.e. on each iteration of a loop. I only actually need the last 10 rows of data for my calculations. How can I load only those last few rows and convert to pandas dataframe using pd.read_sql. I have tried using the stream_results functionality with chunksize as follows

db_name = "SELECT * FROM" + " " + (print_name + "_df")
conn = db_engine.connect().execution_options(stream_results=True)

for df_10 in pd.read_sql(db_name, conn, chunksize=10):
        df = df_10

this works in that df_10 returns the last 10 rows however the operation still has to iterate over each chunk which takes longer than loading the whole database in one go. Is there any way to just pull the latest 10 rows without having to either load the whole lot or split it an load it in n chunks of 10?

  • certainly looks like it would be promising. Would I be able to use the LIMIT statement from within read_sql or would I use it within the db_engine.connect(). Apologies I am less than familiar with sqlite and these are my first forays into this. – DaveTheRave Apr 03 '22 at 20:40
  • Yes you should be able to add LIMIT to the end of the SELECT which should be passed to sqlite. You can read more about it in the sqlite docs [limit offset](https://www.sqlite.org/lang_select.html#limitoffset). – Ian Wilson Apr 03 '22 at 23:32
  • Also note to get the "last" 10 rows you will need to sort the rows in some order (using the ids or a date field with DESC) and then apply limit, with implicit offset of 0. Otherwise you will just get the first 10 rows. – Ian Wilson Apr 03 '22 at 23:37

0 Answers0