0

I've created an ETL and it is working, but it takes a lot to update all the information. We read a table in a database and replicate it in our database. The thing is that we do that all the time. I've seen that you can do an update and I test this and that but nothing works.

Here you have the function where I read the data from a database and put it in ours:

# Create and execute the data from Informe_Logistico
query = "SELECT * FROM DW.bi.INFORME_LOGISTICO"
self.df = pd.read_sql_query(query, conn)


# Enviamos los datos a DW_BI
conn_destino = pyodbc.connect('Driver={SQL Server};'
                              'Server=xxx.xxx.x.xx;'
                              'Database=DWX;'
                              'Trusted_Connection=yes;')

cursor = conn_destino.cursor()

cursor.fast_executemany = True

self.df.shape

engine = create_engine(
    'whatever', fast_executemany=True)

# Execute to_sql to send the data to our DB

BATCH_SIZE = 5000
suma = 0
for chunk in np.array_split(self.df, BATCH_SIZE):
    suma += int(chunk.shape[0])
    chunk.to_sql('INFORME_LOGISTICO',
               con=engine, schema='dbo', if_exists='append', index=False)

The original table has more 2M lines and I don't know if this is the best way to work all that information.

Muhammad Sulaiman
  • 2,399
  • 4
  • 14
  • 28

0 Answers0