I've created an ETL and it is working, but it takes a lot to update all the information. We read a table in a database and replicate it in our database. The thing is that we do that all the time. I've seen that you can do an update and I test this and that but nothing works.
Here you have the function where I read the data from a database and put it in ours:
# Create and execute the data from Informe_Logistico
query = "SELECT * FROM DW.bi.INFORME_LOGISTICO"
self.df = pd.read_sql_query(query, conn)
# Enviamos los datos a DW_BI
conn_destino = pyodbc.connect('Driver={SQL Server};'
'Server=xxx.xxx.x.xx;'
'Database=DWX;'
'Trusted_Connection=yes;')
cursor = conn_destino.cursor()
cursor.fast_executemany = True
self.df.shape
engine = create_engine(
'whatever', fast_executemany=True)
# Execute to_sql to send the data to our DB
BATCH_SIZE = 5000
suma = 0
for chunk in np.array_split(self.df, BATCH_SIZE):
suma += int(chunk.shape[0])
chunk.to_sql('INFORME_LOGISTICO',
con=engine, schema='dbo', if_exists='append', index=False)
The original table has more 2M lines and I don't know if this is the best way to work all that information.