1

I have a table in a Postgresql database that has the columns name and email, the name column is already populated, however the email column is empty. I also have a CSV file that has the columns user and user_email both populated. How can I update the email column using the CSV file to match the user and name as well as update the correct email with user_email?

I've been looking up answers or go through some tutorials but I wasn't quite sure how to word the problem so I wasn't able to find any answers.

This is what my Postgres table looks like right now:

| name     | email    |
| joh      |          |
| alex     |          |
| adams    |          |

This is my CSV file:

| user     | user_eamil|
| joh      | a@g.com  |
| alex     | a@g.com  |
| adams    | a@g.com  |
lospejos
  • 1,976
  • 3
  • 19
  • 35
UWGOOSE
  • 833
  • 3
  • 11
  • 20

2 Answers2

7

You need to create an intermediate table (aka "staging table").

Then import the CSV file into that table. After that you can update the target table from the imported data:

update target_table
   set email = t.user_eamil
from staging_table st
where st."user" = target_table.name;

This assumes that name is unique in your target table and that every user appears exactly once in the input file.

  • This did not work for me (if I understand the question correctly) whereas this answer at https://stackoverflow.com/a/8910810/1904943 did work (see also https://www.oodlestechnologies.com/blogs/Postgres-Sql-Update-Record-in-Bulk-from-CSV-file/). I wanted to update a column in a table with updated values from an external file. Basically: `CREATE TEMP TABLE temp_table ...; UPDATE original_table SET original_table.col = temp_table.col FROM temp_table WHERE original_table.id = temp_table.id;`. Chose another matching column (than `id`), if available, if you don't want to mess with your `id` values. – Victoria Stuart Jun 27 '19 at 21:43
1

Here is a way using Python and SQLAlchemy. This example uses a MySQL connection.

First read your csv into a pandas DataFrame. Then reflect the table created in your database that you would like to update. Reflecting database objects will load information about itself from the corresponding database schema object already existing within the database. Finally, you can iterate through your DataFrame (with iterrows()) and pass an update statement to your table.

import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy.ext.automap import automap_base
from sqlalchemy import create_engine, update
from sqlalchemy.sql import and_
import pandas as pd

# Read the csv containing emails to be updated
emails_to_be_updated = pd.read_csv('~/Path/To/Your/file.csv', encoding = "utf8")

connection_string = "connection:string:to:yourdb"
engine = create_engine(connection_string, echo=False)

# SQLAlchemy: Reflect the tables
Base = automap_base()
Base.prepare(engine, reflect=True)

# Mapped classes are now created with names by default matching that of the table name.
Database_Table = Base.classes.name_email_table

# Iterate through rows that need to be updated
for index, row in emails_to_be_updated.iterrows():
    update_statement = update(Database_Table).where(and_(Database_Table.name == row['user'],
                                                         Database_Table.email == row['user_email']))
Halee
  • 492
  • 9
  • 15