MS-Access and Python - problem with NaN in database

Question

I'm in the process of learning the pandas library. My task is to download the table from the website, transform it and send it to the database - in this case to ms-access. I download the data to my DataFrame.

My problem is that selected table in one of the columns (concerning prices) has value '-'. Looking for information how to deal with it I found 3 main possibilities:

Using 'replace' character '-' to 0. However, this solution does not meet my expectations because the value '-' means no data and not it`s value equal to 0.
The replacement of '-' with an empty string - this solution will not pass, because after changes the column has the data type - float.
Replace '-' with NaN using - .replace('-',np.nan) - This possibility is closest to solving my problem, but after loading data to the access using the "pyodbc" library the replaced records have the value '1,#QNAN'. I'm betting that such a format accepts Access for NaN type, but the problem occurs when I would like to pull the average from the column using SQL:

sql SELECT AVG (nameColumns) FROM nameTable name

returns the 'Overflow' message.

Does anyone have any idea what to do with '-'? Is there any way that the numeric field after loading is just empty?

EDIT - more code:

conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=PathToDB;')
cursor = conn.cursor()
for index,row in df.iterrows():
    cursor.execute("INSERT INTO tableName(col1,col2,col3) VALUES (?,?,?)", 
                   row['col1'], row['col2'],row['col3'])
conn.commit()
cursor.close()
conn.close()

EDIT 2 - more code

import pandas as pd
d ={'col1': [1,2,'-'],'col2':[5,'-',3]}
dfstack = pd.DataFrame(data=d)
dfstack.head()
dfstack = dfstack.replace("-",None)
dfstack.head()

How are you adding data to the database? If you use `to_sql` it convert NaN's to NULL: https://stackoverflow.com/questions/23353732/python-pandas-write-to-sql-with-nan-values — NYC Coder, May 26 '20 at 18:34

score 0 · Answer 1 · answered May 26 '20 at 18:30

0

Maybe you could replace - with the None keyword in python? I'm not sure how pyodbc works but SQL will ignore NULL values with its AVG function and pyodbc might convert None to NULL.

https://www.sqlservertutorial.net/sql-server-aggregate-functions/sql-server-avg/

answered May 26 '20 at 18:30

Tyler Costello

1
1

In each field where I have a dash, only the value from the last record that had a value is added. – Jakub Bidziński May 26 '20 at 18:45

NYC Coder · Answer 2 · 2020-05-26T19:16:30.607

0

You need to replace the '-' with None, that seems to convert it to NULL when inserting using pyodbc:

dfstack = dfstack.where(dfstack!='-', None)

edited May 26 '20 at 19:16

answered May 26 '20 at 18:54

NYC Coder

7,424
2
11
24

I added more code to the main post. If you could look at it. Make the first 4 lines, the last 2 comment. Once you've done that, comment on the first 4 lines, and comment on the last 2 and do it. Unfortunately, the values are not as I would like them to be. – Jakub Bidziński May 26 '20 at 19:10
The edited code correctly converts '-' to None. However, I still have a problem with the database. In the database this column is float type.If I change data type in df to float, None automatically changes to NaN, and if I don't change data type in the data frame, the code given in the post will throw an error 'Data type mismatch...'. – Jakub Bidziński May 26 '20 at 19:31

MS-Access and Python - problem with NaN in database

2 Answers2