1

I'm in the process of learning the pandas library. My task is to download the table from the website, transform it and send it to the database - in this case to ms-access. I download the data to my DataFrame.

My problem is that selected table in one of the columns (concerning prices) has value '-'. Looking for information how to deal with it I found 3 main possibilities:

  1. Using 'replace' character '-' to 0. However, this solution does not meet my expectations because the value '-' means no data and not it`s value equal to 0.
  2. The replacement of '-' with an empty string - this solution will not pass, because after changes the column has the data type - float.
  3. Replace '-' with NaN using - .replace('-',np.nan) - This possibility is closest to solving my problem, but after loading data to the access using the "pyodbc" library the replaced records have the value '1,#QNAN'. I'm betting that such a format accepts Access for NaN type, but the problem occurs when I would like to pull the average from the column using SQL:
sql SELECT AVG (nameColumns) FROM nameTable name

returns the 'Overflow' message.

Does anyone have any idea what to do with '-'? Is there any way that the numeric field after loading is just empty?


EDIT - more code:

conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=PathToDB;')
cursor = conn.cursor()
for index,row in df.iterrows():
    cursor.execute("INSERT INTO tableName(col1,col2,col3) VALUES (?,?,?)", 
                   row['col1'], row['col2'],row['col3'])
conn.commit()
cursor.close()
conn.close()

EDIT 2 - more code

import pandas as pd
d ={'col1': [1,2,'-'],'col2':[5,'-',3]}
dfstack = pd.DataFrame(data=d)
dfstack.head()
dfstack = dfstack.replace("-",None)
dfstack.head()

2 Answers2

0

Maybe you could replace - with the None keyword in python? I'm not sure how pyodbc works but SQL will ignore NULL values with its AVG function and pyodbc might convert None to NULL.

https://www.sqlservertutorial.net/sql-server-aggregate-functions/sql-server-avg/

0

You need to replace the '-' with None, that seems to convert it to NULL when inserting using pyodbc:

dfstack = dfstack.where(dfstack!='-', None)
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • I added more code to the main post. If you could look at it. Make the first 4 lines, the last 2 comment. Once you've done that, comment on the first 4 lines, and comment on the last 2 and do it. Unfortunately, the values are not as I would like them to be. – Jakub Bidziński May 26 '20 at 19:10
  • The edited code correctly converts '-' to None. However, I still have a problem with the database. In the database this column is float type.If I change data type in df to float, None automatically changes to NaN, and if I don't change data type in the data frame, the code given in the post will throw an error 'Data type mismatch...'. – Jakub Bidziński May 26 '20 at 19:31