0

Let assume I have some table in db created by reading csv file (by line). CSV file updates sometimes and I download it from remote resource. Now code looks like this:

import asyncio
import aiohttp
import aiopg
import zipfile

async def update_db():
    zipfile.ZipFile(ZIP_FILENAME).extract(CSV_FILENAME)
    async with aiopg.create_pool(dsn) as pool:
        async with pool.acquire() as conn:
            async with conn.cursor() as cur:
                with open(CSV_FILENAME) as file:
                    headers = next(file)
                    for line in file:
                        region, city = [col for col in line.split(';', COLUMN_QUANTITY)]
                        await cur.execute(f"select id from region where title=%s;", (region,))
                        response_list = list(cur)
                        if len(response_list) == 0:
                            await cur.execute(f"insert into region (title) values (%s) returning id", (region,))
                            region_id = list(cur)[0][0]
                        else:
                            region_id = response_list[0][0]
... ... ...

How can I detect new rows in csv and create new corteges in table without making extra queries to db (whithout checking each line from csv if it exists in table)?

I'm using windows, python3 and PostrgreSQL. When I download updated csv I have no access to old file.

  • You should submit the code you have tried already to help you find your problem – MJK Jul 30 '18 at 12:56
  • Have you tried running a file watching script that can trigger the database update on a file changed event? The code would be helpful – krflol Jul 30 '18 at 16:25
  • this might help: https://stackoverflow.com/questions/12523044/how-can-i-tail-a-log-file-in-python – Jasen Jul 31 '18 at 05:07

2 Answers2

0

A simple way would be saving the last line read from CSV on a file or database. This way you would know which lines are new on the CSV.

Lucas Wieloch
  • 818
  • 7
  • 19
0

There are several ways, depending on your situation. For example, if changes to the file are made by one script, you can update the script to make a copy of the csv file before making updates and then get the differences. Linux "diff" command, for example, will tell you what lines have been added.
However, it sounds like you just might need to know if the csv file was updated so that you know to update the DB table. You can accomplish that (Linux again is the example) by running a cron job at certain intervals and determining if the file has changed in the last x minutes with "ls -l" command.

postoronnim
  • 486
  • 2
  • 10
  • 20