I'm in the process of learning the pandas library. My task is to download the table from the website, transform it and send it to the database - in this case to ms-access. I download the data to my DataFrame.
My problem is that selected table in one of the columns (concerning prices) has value '-'. Looking for information how to deal with it I found 3 main possibilities:
- Using 'replace' character '-' to 0. However, this solution does not meet my expectations because the value '-' means no data and not it`s value equal to 0.
- The replacement of '-' with an empty string - this solution will not pass, because after changes the column has the data type - float.
- Replace '-' with NaN using - .replace('-',np.nan) - This possibility is closest to solving my problem, but after loading data to the access using the "pyodbc" library the replaced records have the value '1,#QNAN'. I'm betting that such a format accepts Access for NaN type, but the problem occurs when I would like to pull the average from the column using SQL:
sql SELECT AVG (nameColumns) FROM nameTable name
returns the 'Overflow' message.
Does anyone have any idea what to do with '-'? Is there any way that the numeric field after loading is just empty?
EDIT - more code:
conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=PathToDB;')
cursor = conn.cursor()
for index,row in df.iterrows():
cursor.execute("INSERT INTO tableName(col1,col2,col3) VALUES (?,?,?)",
row['col1'], row['col2'],row['col3'])
conn.commit()
cursor.close()
conn.close()
EDIT 2 - more code
import pandas as pd
d ={'col1': [1,2,'-'],'col2':[5,'-',3]}
dfstack = pd.DataFrame(data=d)
dfstack.head()
dfstack = dfstack.replace("-",None)
dfstack.head()