I have a large list of tuples where each tuple contains 9 string elements:
pdf_results = [
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/18/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/18/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/19/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/19/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/20/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/20/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/21/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/21/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/23/22', 'SMI', '5', '0', '10', '5'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/24/22', 'RC', '8', '0', '16', '8'),
("Kohl's - Dallas", '-', "Kohl's Cafe", '03/24/22', 'SMI', '5', '0', '10', '5'),
('Bronx-Lebanon Hospital Center', '-', 'Patient Trayline ', '03/18/22', 'RC', '8', '0', '16', '8'),
('Bronx-Lebanon Hospital Center', '-', 'Patient Trayline ', '03/18/22', 'SMI', '5', '0', '10', '5'),
('Bronx-Lebanon Hospital Center', '-', 'Patient Trayline ', '03/19/22', 'RC', '8', '0', '16', '8'),
('Bronx-Lebanon Hospital Center', '-', 'Patient Trayline ', '03/19/22', 'SMI', '5', '0', '10', '5')
]
Without using a Pandas dataframe, how best to group by the first element of each tuple in order to sum the last element of each tuple. Output should look like this:
desired_output = [
("Kohl's - Dallas", 70),
("Bronx-Lebanon Hospital Center", 26)
]
I've tried using itertools.groupby
which seems to be the most appropriate solution; however, getting stuck on properly iterating, indexing, and summing the last element of each tuple without running into one of the following obstacles:
- The last element of each tuple is of type
string
and upon converting toint
prevents iteration asTypeError: 'int' object not iterable
ValueError
is raised whereinvalid literal for int() with base 10: 'b'
Attempt:
from itertools import groupby
def getSiteName(siteChunk):
return siteChunk[0]
siteNameGroup = groupby(pdf_results, getSiteName)
for key, group in siteNameGroup:
print(key) # 1st element of tuple as desired
for pdf_results in group:
# Raises TypeError: unsupported operand type(s) for +: 'int' and 'str'
print(sum(pdf_results[8]))
print()