I based my script largely off of the chitown88 answer of this question. My script is meant to pull lock (i.e. lock and dam) data from XMLs on the Army Corps of Engineers website using BeautifulSoup. From that data it creates a table using Pandas, and then creates a list with each lock getting its own table. Finally, it writes the tables into separate excel sheets. It works perfectly fine. However, now I'm asking for assistance summing the 'Number of Barges' column on each sheet.
I would like to create a 'Total' row at the bottom with the 'Number of Barges' column being the only one that is summed. Both of my attempts did not put the sum at the bottom. Rather, it just repeated each value from that column in a new row.
(1) I have tried
sheet = sheet.append(df,sort=True).reset_index(drop=True)
sheet.loc['Total'] = pd.Series(df['Number of Barges'].sum(), index = ['Number of Barges'])
(2) And I have tried (as seen in my script below.):
df.loc['Total'] = pd.Series(df['Number of Barges'].sum(), index = ['Number of Barges'])
My Script:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas import ExcelWriter
from datetime import datetime
import os
#set the headers as a browser
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
#set up file name
file_path = r"\\pathto\BargeTraffic"
excel_file = 'LockQueueReport_TEST' + str(datetime.now().strftime('_%m_%d_%Y')) + '.xlsx'
excel_file_full = os.path.join(file_path, excel_file)
#create a list of locks that will be used as three separate tables
lockName = ['Lockport Lock', 'Brandon Rd Lock', 'Dresden Island Lock']
lockNo = ['02', '03', '04']
results = []
for lock in lockNo:
url = 'https://corpslocks.usace.army.mil/lpwb/xml.lockqueue?in_river=IL&in_lock=' + lock
#print (url)
link = requests.get(url).text
soup = BeautifulSoup(link,'lxml')
#grab each row, pull the data
rows = soup.find_all('row')
#take those rows on the site and create a table, and make a list of tables (one table for each lock)
sheet = pd.DataFrame()
for row in rows:
name = row.find('vessel_name').text.strip()
no = row.find('vessel_no').text.strip()
dir = row.find('direction').text.strip()
barno = row.find('num_barges').text.strip()
arr = row.find('arrival_date').text.strip()
#because these fields could have no value, put in try/except block
try:
end = row.find('end_of_lockage').text.strip()
except:
end = ''
pass
df = pd.DataFrame([[name,no,dir,barno,arr,end]], columns=['Name','Vessel No.','Direction','Number of Barges','Arrival', 'End of Lockage'])
df.loc['Total'] = pd.Series(df['Number of Barges'].sum(), index = ['Number of Barges']) #
sheet = sheet.append(df,sort=True).reset_index(drop=True)
results.append(sheet)
#function that takes that list of tables and write them into separate excel sheets
def save_xls(list_dfs, xls_path):
with ExcelWriter(xls_path) as writer:
for n, df in enumerate(list_dfs):
df.to_excel(writer,'%s' %lockName[n],index=False,)
writer.save()
save_xls(results,excel_file_full)
print('----done----')
Result: