0

I am still a newbie in pandas. I have an excel file which contains more than 700 columns with 600 rows. I am trying to split it into multiple dataframes so I can insert it into sql table. The reason why I am trying to split is because I am getting an error saying The statement has been terminated. (3621); [42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Cannot create a row of size 8349 which is greater than the allowable maximum row size of 8060. I want to split it by 250 columns per dataframe and also take the first row so I can join it in the sql.

Another reason why I cannot specify the dataframe range is because, different excel files might have different number of columns. Lets say the excel file contains 800 columns, but I set it as 700: 850, it gives me an error saying iloc giving 'IndexError: single positional indexer is out-of-bounds'.

If there is a way to do it dynamically by splitting it by 250 rows, it would be great. Below is my code which is working but I need to specify the range of index:

 def sqlcol(dfparam):    
     dtypedict = {}
     for i,j in zip(dfparam.columns, dfparam.dtypes):
        if "object" in str(j):
            dtypedict.update({i: sqla.types.NVARCHAR(length=255)})

        if "datetime" in str(j):
            dtypedict.update({i: sqla.types.DateTime()})

        if "float" in str(j):
           dtypedict.update({i: sqla.types.Float()})

        if "int" in str(j):
           dtypedict.update({i: sqla.types.BIGINT()})
    
        if "decimal" in str(j):
           dtypedict.update({i: sqla.types.DECIMAL()})
        
    return dtypedict

def import_varchar_to_hst03(db:str,tb_name:str,df):
    n= 1

    import pandas as pd
    import sqlalchemy as sqla
    import urllib
    import pyodbc
    t=sqlcol(df1)
    a=sqlcol(df2)
    c=sqlcol(df3)
    quoted = urllib.parse.quote_plus("DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE="+db+";Trusted_Connection=yes;")
engine = sqla.create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted), fast_executemany = True)
    df1.to_sql(tb_name, schema='dbo', con = engine, index=False,dtype=t,if_exists='replace')
    df2.to_sql(tb_name + ( "A" * n), schema='dbo', con = engine, index=False,dtype=a,if_exists='replace')
    df3.to_sql(tb_name + ( "B" * n), schema='dbo', con = engine, index=False,dtype=c,if_exists='replace')

import pandas as pd
import numpy as np

Ex=pd.read_excel(r'C:\Users\sriram.ramasamy\Desktop\Testsriram.xlsx',sheet_name=None)


for sh,v in Ex.items():
    df=pd.DataFrame(v)  
    df1 = df.iloc[:,:255] 
    df2 = df.iloc[:,np.r_[0:1,256:500]]
    df3 = df.iloc[:,np.r_[0:1,501:700]]
    import_varchar_to_hst03('InsightMaster',sh,df)

print('data imported to database')
Sriram
  • 433
  • 4
  • 20
  • Wouldn't be as easy as using `df.columns` (for the number of columns) and loop through that at every 250? – FAB Dec 18 '22 at 11:29
  • See https://stackoverflow.com/questions/44729727 – user19077881 Dec 18 '22 at 11:30
  • @FAB can you help me to show an example of how to do it> – Sriram Dec 18 '22 at 11:34
  • @user19077881 i got the same error Cannot create a row of size 8357 which is greater than the allowable maximum row size of 8060. The problem on my case is that, the columns are too many which causing one row of size more than 8060. Thats y I am trying to insert by splitting the dataframes – Sriram Dec 18 '22 at 11:41

0 Answers0