0

I have a inquiry from my job to create 2 tables related one-to-one and insert some rows on the table. I have a .CSV with the data dictionary (column name and data type) from the table and I am wondering to know how to declare the tables columns automatically (declarative syntax) without write one by one column (there are 260 columns). Same thing for the insert, how to add rows to the multiple columns table without write column by column?

I have the data in a Data frame but I was not able to insert it using df.to_sql from pandas. Do you guys have any similar example?

The database used is MySQL. The table structure in the database is showed below: enter image description here

Below is what I did. I have created a function to define the table constraints.

def create_mysql_tables(dataframe, engine):

# print(dataframe.iloc)
if 'Active' in dataframe.columns:
    start = time.time()

    engine.execute('DROP TABLE IF EXISTS com_treb;')
    engine.execute('DROP TABLE IF EXISTS cnd_treb;')
    engine.execute('DROP TABLE IF EXISTS res_treb;')
    engine.execute('DROP TABLE IF EXISTS mls_treb;')

    dataframe.to_sql("mls_treb", if_exists='replace', con=engine,
                     dtype={'mls_number': VARCHAR(dataframe.index.get_level_values('mls_number').str.len().max())})

    with engine.connect() as con:
        con.execute('ALTER TABLE `mls_treb` ADD PRIMARY KEY (`mls_number`);')

    end = time.time()
    print(end - start)

First questions is, there is a better way to define a primary key using the SQLALCHEMY avoiding to write the query?

After I have created the table I have to insert some row if it doesn't exist. I was trying to do it by using the function below.

def crud_database(engine, mls_full_dataframe, res_full_dataframe):

for index, row in mls_full_dataframe.iterrows():

    row_db_mls = pd.read_sql_query("SELECT * FROM mls_treb WHERE `mls_number` LIKE %(mlsnumber)s ", engine, params={'mlsnumber' : row.at['mls_number']})
    if row_db_mls.empty:
        
        row.to_sql("mls_treb", if_exists='append', con=engine, index_label='mls_number' )
        if row.at['Class'][0] == 'RES':
            row_res = res_full_dataframe.iloc[row.at['mls_number']]
    else:
        print(0)

but I am getting the error below when I am trying to insert the row from picture below: image from row to be inserted with index 0

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1782, in _execute_context
    self.dialect.do_executemany(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 729, in do_executemany
    cursor.executemany(statement, parameters)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/cursor.py", line 670, in executemany
    return self.execute(stmt)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/cursor.py", line 568, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/connection.py", line 854, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/connection.py", line 664, in _handle_result
    raise errors.get_exception(packet)
mysql.connector.errors.ProgrammingError: 1054 (42S22): Unknown column '3' in 'field list'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/sql.py", line 1419, in to_sql
    raise err
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/sql.py", line 1411, in to_sql
    table.insert(chunksize, method=method)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/sql.py", line 845, in insert
    exec_insert(conn, keys, chunk_iter)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/io/sql.py", line 762, in _execute_insert
    conn.execute(self.table.insert(), data)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1289, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1481, in _execute_clauseelement
    ret = self._execute_context(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
    self._handle_dbapi_exception(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2026, in _handle_dbapi_exception
    util.raise_(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1782, in _execute_context
    self.dialect.do_executemany(
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 729, in do_executemany
    cursor.executemany(statement, parameters)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/cursor.py", line 670, in executemany
    return self.execute(stmt)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/cursor.py", line 568, in execute
    self._handle_result(self._connection.cmd_query(stmt))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/connection.py", line 854, in cmd_query
    result = self._handle_result(self._send_cmd(ServerCmd.QUERY, query))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/mysql/connector/connection.py", line 664, in _handle_result
    raise errors.get_exception(packet)
sqlalchemy.exc.ProgrammingError: (mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column '3' in 'field list'
[SQL: INSERT INTO mls_treb (mls_number, `3`) VALUES (%(mls_number)s, %(3)s)]
[parameters: ({'mls_number': 'mls_number', '3': 'N5404949'}, {'mls_number': 'active', '3': True}, {'mls_number': 'class_name', '3': 'RES'}, {'mls_number': 'active_date', '3': datetime.date(2022, 1, 16)})]
(Background on this error at: https://sqlalche.me/e/14/f405)
  • 1
    Please provide enough code so others can better understand or reproduce the problem. – Community Feb 23 '22 at 21:58
  • 1
    Please [edit] your question with more details. For example, what happened when you tried to use `.to_sql()`? Did you get an error message? If so, what did it say? Also, what database are you using? – Gord Thompson Feb 23 '22 at 22:58
  • You are mixing up column *names* with column *data*. `Unknown column '3' in 'field list'` suggests that your `row` has a column named `3` and there is no such column in the `mls_treb` table. Also you seem to be using the index value of the row/DataFrame which almost certainly has no meaning in the database table. – Gord Thompson Feb 24 '22 at 14:19
  • the index 3 is the dataframe row to be inserted. The dataframe row has the information as described: 3, 'N5404949', 'True', 'RES', '2022-01-16'. How to ignore the index when using this line row.to_sql("mls_treb", if_exists='append', con=engine, index_label='mls_number' ). – janio mendonca junior Feb 24 '22 at 14:35
  • the database table structure is:+------------+------------+--------+-------------+ | mls_number | class_name | active | active_date | +------------+------------+--------+-------------+ | C4659618 | COM | 1 | 2022-01-28 | | C5405199 | RES | 1 | 2022-01-03 | | C5441169 | CND | 1 | 2022-01-15 | +------------+------------+--------+-------------+ – janio mendonca junior Feb 24 '22 at 14:41
  • related; https://stackoverflow.com/a/63888806/2144390 – Gord Thompson Feb 24 '22 at 16:23
  • Hey Gord Thompson, I managed to fix the problem. I was trying to append a series instead a dataframe....just realize it after your comment, thank you. How can I close this topic as solved? – janio mendonca junior Feb 25 '22 at 12:41
  • You can add an answer with a brief explanation of the solution and then accept the answer. (Here on SO it is perfectly acceptable to answer your own question.) – Gord Thompson Feb 28 '22 at 18:05

1 Answers1

0

With Gord Thompsom help I could find the answer for problem. Prior to use the row to insert

row_to_insert = mls_full_dataframe.loc[mls_full_dataframe['mls_number'] == row.at['mls_number']]
       
row_to_insert.to_sql("mls_treb", if_exists='append', con=engine, index=False)

I was getting a series instead of row as dataframe. So, when I was trying to insert the row as serie, the number '3' was the index used as key and the compiler did not find the column name properly with that name "3". To fix it, I was retrieving the row_to_insert as dataframe and trying to insert it into database.