How to update db by reading constantly updated csv without multiple selects in python?

Question

Let assume I have some table in db created by reading csv file (by line). CSV file updates sometimes and I download it from remote resource. Now code looks like this:

import asyncio
import aiohttp
import aiopg
import zipfile

async def update_db():
    zipfile.ZipFile(ZIP_FILENAME).extract(CSV_FILENAME)
    async with aiopg.create_pool(dsn) as pool:
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                with open(CSV_FILENAME) as file:
                    headers = next(file)
                    for line in file:
                        region, city = [col for col in line.split(';', COLUMN_QUANTITY)]
                        await cur.execute(f"select id from region where title=%s;", (region,))
                        response_list = list(cur)
                        if len(response_list) == 0:
                            await cur.execute(f"insert into region (title) values (%s) returning id", (region,))
                            region_id = list(cur)[0][0]
                        else:
                            region_id = response_list[0][0]
... ... ...

How can I detect new rows in csv and create new corteges in table without making extra queries to db (whithout checking each line from csv if it exists in table)?

I'm using windows, python3 and PostrgreSQL. When I download updated csv I have no access to old file.

You should submit the code you have tried already to help you find your problem — MJK, Jul 30 '18 at 12:56
Have you tried running a file watching script that can trigger the database update on a file changed event? The code would be helpful — krflol, Jul 30 '18 at 16:25
this might help: https://stackoverflow.com/questions/12523044/how-can-i-tail-a-log-file-in-python — Jasen, Jul 31 '18 at 05:07

score 0 · Answer 1 · answered Jul 30 '18 at 13:05

0

A simple way would be saving the last line read from CSV on a file or database. This way you would know which lines are new on the CSV.

answered Jul 30 '18 at 13:05

Lucas Wieloch

818
7
19

The problem is that new lines don't append to file. Also I don't have access to old csv (when upload new file). – Roman Shchuka Jul 30 '18 at 13:17
I don't think CSV files track which rows are updated. Maybe you could add a new column on the CSV to control this? – Lucas Wieloch Jul 30 '18 at 13:43

score 0 · Answer 2 · answered Jul 30 '18 at 16:10

There are several ways, depending on your situation. For example, if changes to the file are made by one script, you can update the script to make a copy of the csv file before making updates and then get the differences. Linux "diff" command, for example, will tell you what lines have been added.
However, it sounds like you just might need to know if the csv file was updated so that you know to update the DB table. You can accomplish that (Linux again is the example) by running a cron job at certain intervals and determining if the file has changed in the last x minutes with "ls -l" command.

How to update db by reading constantly updated csv without multiple selects in python?

2 Answers2