Python/ SQL : replacing the empty strings of a DataFrame by a "Null" value to insert the data in a database

Question

Let's say that I have this dataframe :

REFERENCE = ["GZF882348G", "SFGUZBJLNJU", "FTLNGZ242112", "DFBHGVGHG543"]
IBAN = ["FR7343563", "FR4832545", "FR9858331", "FR2001045"]
DEBIT = [26, '', 856, '']
CREDIT = ['', 324, '', 876]
MONTANT = [641, 33, '', 968]

df = pd.DataFrame({'Référence' : REFERENCE, 'IBAN' : IBAN, 'Débit' : DEBIT, 'Crédit' : CREDIT, 'Montant' : MONTANT})

I have a problem of format to insert this kind of data in my database. The columns "Débit", "Crédit", "Montant" are defined to get floats as data. However the data of these columns are not only integers, I have empty strings too and that is my issue. I know that I have to write a condition that replace a empty string by a "Null" value in the SQL format, however I do not know how to do that in python or in SQL. I am discovering/learning the SQL environment.

Here is my code :

import pandas as pd
import pyodbc 

server = '...'
database = '...'
username = '...' 
password = '...'
driver = '...'

connection = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+password)
cursor = connection.cursor()

for i, row in df.iterrows():


    sql_exe = "INSERT INTO dbo.tbl_data_xml (Réference,IBAN,Débit,Crédit,Montant) VALUES (?,?,?,?,?)"
    cursor.execute(sql_exe, tuple(row))
    
    connection.commit()

Anyone can help me please.

Thank you

Parfait · Accepted Answer · 2020-11-30T21:55:07.927

1

You appear to be mixing types in Pandas data frame where string, '', is combined with integer in the same column as evidenced by all object types. In relational databases you cannot mix data types. And converting '' to string 'NULL' will not resolve your issue. In SQL, NULL <> 'NULL'

df.dtypes

# Référence    object
# IBAN         object
# Débit        object
# Crédit       object
# Montant      object
# dtype: object

Therefore, convert columns to numeric with pd.to_numeric where empty string, '', converts to NaN which this entity should translate to SQL's NULL entity.

df[['Débit', 'Crédit', 'Montant']] = df[['Débit', 'Crédit', 'Montant']].apply(pd.to_numeric)

df.dtypes
# Référence     object
# IBAN          object
# Débit        float64
# Crédit       float64
# Montant      float64
# dtype: object

df
#       Référence       IBAN  Débit  Crédit  Montant
# 0    GZF882348G  FR7343563   26.0     NaN    641.0
# 1   SFGUZBJLNJU  FR4832545    NaN   324.0     33.0
# 2  FTLNGZ242112  FR9858331  856.0     NaN      NaN
# 3  DFBHGVGHG543  FR2001045    NaN   876.0    968.0

Then run your query. In fact, avoid the slower for loop with iterrows and consider df.to_numpy + cursor.executemany.

# PREPARED STATEMENT
sql_exe = "INSERT INTO dbo.tbl_data_xml (Réference,IBAN,Débit,Crédit,Montant) VALUES (?,?,?,?,?)"

# CONVERT DATA TO LIST OF NUMPY ARRAYS
sql_data = df.where(pd.notnull(df), None).to_numpy().replace(.tolist()

# EXECUTE ACTION QUERY
cursor.executemany(sql_exe, sql_data)
connection.commit()

edited Nov 30 '20 at 21:55

answered Nov 30 '20 at 02:13

Parfait

104,375
17
94
125

Thank you @Parfait for your help, it was exactly what I was looking for. However I still can not say that it works, because I have a new error that I do not understand : "The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 9 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision." My dataframe has 9 columns in reality, I have defined these columns to get (float,null) in the database – Maikiii Nov 30 '20 at 15:20
Sounds like you did not properly convert needed column as empty string still renders. Can you post a data frame sample of 9 columns rather than 5 columns? Also, are you adjusting the insert columns of SQL statement? Try to stay as consistent to this answer as possible when adding/removing columns. Please post your *new* attempt. – Parfait Nov 30 '20 at 15:46
Thank you, you find it in this new post : https://stackoverflow.com/questions/65080139/python-and-sql-replacing-the-empty-strings-of-a-dataframe-by-a-null-value-of If you can help me, I would appreciate it – Maikiii Nov 30 '20 at 19:45
Technically your question is a duplicate of this one which leaves this unresolved. But I see you have an answer. – Parfait Nov 30 '20 at 21:48
Thank you for your help, if you have another way to do it, I am open to it – Maikiii Nov 30 '20 at 21:57
1

See my edit, converting remaining `NaN` to `None` using `DataFrame.where`. – Parfait Nov 30 '20 at 22:00

score 0 · Answer 2 · answered Nov 29 '20 at 21:35

Convert to numeric the respective columns and fillna(NULL)

df[['Débit', 'Crédit', 'Montant']]=df.iloc[:,2:].apply(lambda x: pd.to_numeric(x).fillna('NULL'))



     Référence       IBAN Débit Crédit Montant
0    GZF882348G  FR7343563    26   NULL     641
1   SFGUZBJLNJU  FR4832545  NULL    324      33
2  FTLNGZ242112  FR9858331   856   NULL    NULL
3  DFBHGVGHG543  FR2001045  NULL    876     968

Python/ SQL : replacing the empty strings of a DataFrame by a "Null" value to insert the data in a database

2 Answers2

Linked

Related