0

Still learning Python, so bear with me. I use the following script to import a csv file into a local SQL database. My problem is that the csv file usually has a bunch of empty rows at the end of it and I get primary key errors upon import. What's the best way to handle this? If I manually edit the csv in a text editor I can delete all the rows of ,,,,,,,,,,,,,,,,,,,,,,,,,,, and it works perfectly.

Bonus question, is there an easy way to iterate through all .csv files in a directory, and then delete or move them after they've been processed?

import pandas as pd

data = pd.read_csv (r'C:\Bookings.csv')
df = pd.DataFrame(data, columns= ['BookingKey','BusinessUnit','BusinessUnitKey','DateTime','Number','Reference','ExternalId','AmountTax','AmountTotal','AmountPaid','AmountOpen','AmountTotalExcludingTax','BookingFee','MerchantFee','ProcessorFee','NumberOfPersons','Status','StatusDateTime','StartTime','EndTime','PlannedCheckinTime','ActualCheckinTime','Attendance','AttendanceDatetime','OnlineBookingCheckedDatetime','Origin','CustomerKey'])
df = df.fillna(value=0)
print(df)

import pyodbc

conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=D3VBUP\SQLEXPRESS;'
                      'Database=BRIQBI;'
                      'Trusted_Connection=yes;')
cursor = conn.cursor()

for row in df.itertuples():
    cursor.execute('''
                INSERT INTO BRIQBI.dbo.Bookings (BookingKey,BusinessUnit,BusinessUnitKey,DateTime,Number,Reference,ExternalId,AmountTax,AmountTotal,AmountPaid,AmountOpen,AmountTotalExcludingTax,BookingFee,MerchantFee,ProcessorFee,NumberOfPersons,Status,StatusDateTime,StartTime,EndTime,PlannedCheckinTime,ActualCheckinTime,Attendance,AttendanceDatetime,OnlineBookingCheckedDatetime,Origin,CustomerKey)
                VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
                ''',
                row.BookingKey,
                row.BusinessUnit,
                row.BusinessUnitKey,
                row.DateTime,
                row.Number,
                row.Reference,
                row.ExternalId,
                row.AmountTax,
                row.AmountTotal,
                row.AmountPaid,
                row.AmountOpen,
                row.AmountTotalExcludingTax,
                row.BookingFee,
                row.MerchantFee,
                row.ProcessorFee,
                row.NumberOfPersons,
                row.Status,
                row.StatusDateTime,
                row.StartTime,
                row.EndTime,
                row.PlannedCheckinTime,
                row.ActualCheckinTime,
                row.Attendance,
                row.AttendanceDatetime,
                row.OnlineBookingCheckedDatetime,
                row.Origin,
                row.CustomerKey
                )
conn.commit()
Tricon
  • 1
  • 1
  • I think the question should be how to remove empty rows from a dataframe. It is not connected to the SQL insert. For iterating through a directory, a google search will give you multiple good answers. I would not delete the files automatically, until some verification tells you that you have everything and it's saved. – roadrunner66 Jun 17 '20 at 16:38
  • These files are copies, so I'm not particularly worried about deleting them. We will definitely make sure the data is coming across before we use this on a bigger scale though. Great point on removing the rows from the df, I'll start hunting that down. – Tricon Jun 17 '20 at 17:38

1 Answers1

0

Ended up being really easy. I added the dropna function so all the rows of data that had no data in them would be dropped.

df = df.dropna(how = 'all')

Now off to find out how to iterate through multiple files in a directory and move them to another location.

Tricon
  • 1
  • 1