0

I am a newbie into python and I am trying to optimize a snippet of my program from using two for loops to using the list comprehension in Python3. While doing this, I am unable to calculate the sum of more than one column. For Ex, if I have columns 1,2, and 3 of types float int, and string in a dictionary, I am able to calculate the sum only for column one and not for column 2.

The double for loop looks something like this

final_dict = []
    for k, g in itertools.groupby(temp_dict, key=lambda x: (x['transaction_category_id'])):
        txn_amount = 0
        distinct_user_count = 0
        for v in g:
            # print(k)
            txn_amount += float(v['transaction_amount'])
            distinct_user_count += v['user_id_count']
            # print(v)

        final_dict.append({'transaction_category_id': k, 'aggregated_transaction_amount': txn_amount,
                           'distinct_user_count': distinct_user_count})

The code I want to optimise to should ideally look something like this :

final_result = [[k, sum(float(v['transaction_amount']) for v in g),sum(s['user_id_count'] for s in g)] for k, g in
                    itertools.groupby(temp_dict, key=lambda x: (x['transaction_category_id']))]

But the code does not add up values for the user_id_count column and return sum as 0.

The sample data looks something like this :

user_id,transaction_amount,transaction_category_id
b2d30a62-36bd-41c6-8221-987d5c4cd707,63.05,3
b2d30a62-36bd-41c6-8221-987d5c4cd707,13.97,4
b2d30a62-36bd-41c6-8221-987d5c4cd707,97.15,4
b2d30a62-36bd-41c6-8221-987d5c4cd707,23.54,5

and the ideal output would look like :

['4', 111.12, 2],
['3', 63.05, 1],
['5', 23.54, 1],

but it prints out and the ideal output would look like :

['4', 111.12, 0],
['3', 63.05, 0],
['5', 23.54, 0],

I tried the below sample code out but the output is not what I expected :

final_result = [[k, sum(float(v['transaction_amount']) for v in g),sum(s['user_id_count'] for s in g)] for k, g in
                    itertools.groupby(temp_dict, key=lambda x: (x['transaction_category_id']))]
FoxFil
  • 320
  • 10
  • What does the question have to do with optimization? Also, **what is in `temp_dict`**? Don't show us a data file; show us the structured data that is the input **to this part of the program**. – Karl Knechtel Jan 27 '23 at 08:40
  • Reducing the number of lines while also reducing readability can hardly be called "optimising". Remember, strangers (including your future self) will know nothing about this clever comprehension and have to dissect it to understand. – ljmc Jan 27 '23 at 08:41
  • Does [Comprehension on nested iterables](https://stackoverflow.com/questions/18072759) help? – Karl Knechtel Jan 27 '23 at 08:42

2 Answers2

0

You seem to have a very simple comma-delimited CSV file. To get output similar to that shown in the question you could do this:

from collections import defaultdict

FILENAME = '/Volumes/G-Drive/sample.csv'

def gendict():
    return {'amount': 0.0, 'count': 0}

summary = defaultdict(gendict)

with open(FILENAME) as csv:
    next(csv, None) # skip column headers
    for line in map(str.rstrip, csv):
        *_, _amount, _id = line.split(',')
        summary[_id]['amount'] += float(_amount)
        summary[_id]['count'] += 1

slist = [[k, *v.values()] for k, v in summary.items()]

print(slist)

Output:

[['3', 63.05, 1], ['4', 111.12, 2], ['5', 23.54, 1]]
DarkKnight
  • 19,739
  • 3
  • 6
  • 22
0

Alternatively, you can use csv.DictReader with collections.defaultdict to simulate the groupby.agg of :

from csv import DictReader
from collections import defaultdict

out = defaultdict(lambda: [0,0])

with open("tmp/Ashish Shetty.csv", "r") as csvfile:
    reader = DictReader(csvfile)
    for row in reader:
        tra_id = str(row["transaction_category_id"])
        out[tra_id][0] += float(row["transaction_amount"])
        out[tra_id][1] += 1

out = [[k]+v for k,v in out.items()]

Output :

print(out)
#[['3', 63.05, 1], ['4', 111.12, 2], ['5', 23.54, 1]]
Timeless
  • 22,580
  • 4
  • 12
  • 30