I've been able to extract data from two separate xlsx and combine them into a single xlsx sheet using pandas.
I know have a table that looks like this.
Home Start Date Gross Earning Tax Gross Rental Commission Net Rental
3157 2020-03-26 00:00:00 -268.8 -28.8 -383.8 -36 -338.66
3157 2020-03-26 00:00:00 268.8 28.8 153.8 36 108.66
3157 2020-03-24 00:00:00 264.32 28.32 149.32 35.4 104.93
3157 2020-03-13 00:00:00 625.46 67.01 510.46 83.7675 405.4225
3157 2020-03-13 00:00:00 558.45 0 443.45 83.7675 342.9325
3157 2020-03-11 00:00:00 142.5 0 27.5 21.375 1.855
3157 2020-03-11 00:00:00 159.6 17.1 44.6 21.375 17.805
3157 2020-03-03 00:00:00 349.52 0 234.52 52.428 171.612
3157 2020-03-03 00:00:00 391.46 41.94 276.46 52.428 210.722
So if you take a look at the first two rows, the name in the Home column is the same (In this example, 3157 Tocoa) but they are also the same for the next few rows. But in the Start date column, only the first two items in that column are the same (In this case 3/26/2020 12:00:00 AM) So what i need to do is the following
If the dates are the same, and the Home is the same, then I need the sum of all of the following columns. (In this case, I would need the sum of -268.8 and 268.8, the sum of -28.8 and 28.8 and so on) It is also important to mention there are instances where there is a total of more than two matching start dates.
I will include the code I have used to get to where I am now, I would like to mention I am fairly new to python so I'm sure there is a way to do this super simple but I am just not familiar. I am also new to stackoverflow so if I am missing something or added something I should have please forgive me
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
import numpy as np
import matplotlib.pyplot as plt
import os
# class airbnb:
#Gets the location path for the reports that come raw from the channel
airbnb_excel_file = (r'C:\Users\Christopher\PycharmProjects\Reporting with
python\Data_to_read\Bnb_feb_report.xlsx')
empty_excel_file = (r'C:\Users\Christopher\PycharmProjects\Reporting with
python\Data_to_read\empty.xlsx')
#Defines the data frame
df_airbnb = pd.read_excel(airbnb_excel_file)
df_empty = pd.read_excel(empty_excel_file)
gross_earnings = df_airbnb['Gross Earnings']
tax_amount = df_airbnb['Gross Earnings'] * 0.06
gross_rental = df_airbnb['Gross Earnings'] - df_airbnb['Cleaning Fee']
com = ((gross_rental - tax_amount) + df_airbnb['Cleaning Fee']) * 0.15
net_rental = (gross_rental - (com + df_airbnb['Host Fee']))
house = df_airbnb['Listing']
start_date = df_airbnb['Start Date']
# df = pd.DataFrame(df_empty)
# df_empty.replace('nan', '')
#
# print(net_rental)
df_report = pd.DataFrame(
{'Home': house, 'Start Date': start_date, 'Gross Earning': gross_earnings, 'Tax': tax_amount,
'Gross Rental': gross_rental, 'Commission': com, 'Net Rental': net_rental})
df_report.loc[(df_report.Home == 'New house, Minutes from Disney & Attraction'), 'Home'] = '3161
Tocoa'
df_report.loc[(df_report.Home == 'Brand-New House, located minutes from Disney 5151'), 'Home'] =
'5151 Adelaide'
df_report.loc[(df_report.Home == 'Luxury House, Located Minutes from Disney-World 57'), 'Home'] =
'3157 Tocoa'
df_report.loc[(df_report.Home == 'Big house, Located Minutes from Disney-World 55'), 'Home'] = '3155
Tocoa'
df_report.sort_values(by=['Home'], inplace=True)
# writer = ExcelWriter('Final_Report.xlsx')
# df_report.to_excel(writer, 'sheet1', index=False)
# writer.save()
# class homeaway:
homeaway_excel_file = (r'C:\Users\Christopher\PycharmProjects\Reporting with
python\Data_to_read\PayoutSummaryReport2020-03-01_2020-03-29.xlsx')
df_homeaway = pd.read_excel(homeaway_excel_file)
cleaning = int(115)
house = df_homeaway['Address']
start_date = df_homeaway['Check-in']
gross_earnings = df_homeaway['Gross booking amount']
taxed_amount = df_homeaway['Lodging Tax Owner Remits']
gross_rental = (gross_earnings - cleaning)
com = ((gross_rental-taxed_amount) + cleaning) * 0.15
net_rental = (gross_rental - (com + df_homeaway['Deductions']))
df_report2 = pd.DataFrame(
{'Home': house, 'Start Date': start_date, 'Gross Earning': gross_earnings, 'Tax': taxed_amount,
'Gross Rental': gross_rental, 'Commission': com, 'Net Rental': net_rental})
# writer = ExcelWriter('Final_Report2.xlsx')
# df_report2.to_excel(writer, 'sheet1', index=False)
# writer.save()
df_combined = pd.concat([df_report, df_report2])
writer = ExcelWriter('Final_Report_combined.xlsx')
df_report2.to_excel(writer, 'sheet1', index=False)
writer.save()