1

I am writing into a database in badges of X users at a time, and I want to update the user information (replace a row). I only want to have one row per user.

However, when using the parameter if_exists='replace' the new badge replaces the old badge (deleting previous -distinct- users).

And if I do 'append' instead, I will end up with loads of duplicates.

My table has multiple columns as PK, so I am using this definition from krvkir: https://stackoverflow.com/a/31045044/9153261

def to_sql_k(self, frame, name, if_exists='fail', index=False, index_label=None, 
             schema=None, chunksize=None, dtype=None, **kwargs):
    '''Definition to add PK'''
    if dtype is not None:
        for col, my_type in dtype.items():
            if not isinstance(to_instance(my_type), TypeEngine):
                raise ValueError('The type of %s is not a SQLAlchemy '
                                    'type ' % col)

    table = pd.io.sql.SQLTable(name, self, frame=frame, index=index,
                        if_exists=if_exists, index_label=index_label,
                        schema=schema, dtype=dtype, **kwargs)
    table.create()
    table.insert(chunksize)

And I am using this other definition to store information into DB:

def writting_in_db(df, user_list, engine):
    try:    
        ddtype={
            'User': INTEGER,
            'C1': INTEGER,
            'C2': INTEGER
        }

        pandas_sql = pd.io.sql.pandasSQL_builder(engine)
        to_sql_k(pandas_sql, df, 'TableName', keys=('User', 'C1'), 
                 if_exists='append', dtype=ddtype)

    except:
        print("Exception, this list of users was not updated into database:\n")
        sys.stdout.flush()
        print(user_list)
        sys.stdout.flush()

Any solution that isn't 'delete ALL the contents of the database before every iteration'?

And ideally that it also isn't, 'querying all the existing users on the DB and iterating to see if they are present on the new df to export'.

Salut,

prp
  • 914
  • 1
  • 9
  • 24
  • 1
    Pandas to_sql if_exists='replace' doesn't operate over the row, just over the table. If yo set **replace** it will drop the table and create a new one, if you set **append** it will insert rows over the same table. – nacho May 21 '19 at 09:32
  • Any alternative? I also tried writing all the df into the DB at once, but it gets too big and I am often getting an error because of it. – prp May 21 '19 at 09:35
  • 1
    I am afraid the only alternative i know it's to iterate over the dataframe and for each row, use the **INSERT .. ON DUPLICATE KEY UPDATE ..."** – nacho May 21 '19 at 09:38
  • In that case, the 'less-bad' option would be to do **append** and then issue a MySQL command to delete duplicates? – prp May 21 '19 at 09:49

0 Answers0