Pre-calculating the values of a dictionary with tuple as its value

Question

I have the following code example where I am creating a dictionary result for each id with a tuple of 2 numbers for its value.

# populate the ids list (contents of the current directory which is for a speicif id name)
ids_list = [filename for filename in glob('*' + .txt)

def some_numerical_calc(filename):
    # calculates and returns some number as string
def size_of_file(filename):
    # calculates and returns size number as string

def count_stuff(id, filename):
    result = { id: (some_numerical_calc(filename), size_of_file(filename)) }

for id in ids_list:
    for f in files_list:
        count_stuff(id, f)

The idea is that I will eventually aggregate all these dictionary key-value pairs under one dictionary (perhaps this parts needs redesigning..).

The problem I am dealing is for cases where the files_list of a specific id is greater than 1; in these cases I would like every 2 numbers inside the tuple for each filename to be added with the previous numbers for that same filename.

As an example,

ids_list = ['001', '002', '003']

where for id='001' it has files_list=['file1.txt', 'file2.txt', 'file3.txt']

and if

some_numerical_calc('file1.txt') gives 10 and size_of_file('file1.txt') gives 80,

some_numerical_calc('file2.txt') gives 150 and size_of_file('file2.txt') gives 35,

some_numerical_calc('file3.txt') gives 30 and size_of_file('file3.txt') gives 120,

then, I would expect the output for id='001' to be result = { '001': (190, 235) }

I know that tuples are immutable. I am struggling to come up with an implementation to pre-compute the 2 numbers for all files for each id and then create its specific dictionary entry. Alternatively, perhaps I should remove the tuples structure -even though I was hoping to use namedtuples and store the 2 numbers in a set (?). Any suggestions are would be much appreciated.

Hoping for an efficient and pythonic suggestions.

use a list, change the values while you are using it, cast every value to a namedtuple when you are done. Or alternatively you could use `sum(some_numerical_calc(f) for f in file_list), sum(size_of_file(f) for f in file_list)` and then use that as the two values in the tuple. — Tadhg McDonald-Jensen, Jun 17 '16 at 19:58

score 1 · Answer 1 · answered Jun 17 '16 at 20:04

Part of the problem is you've organized your code badly. You're creating your dictionary too early.

Consider if you reorganized it something like this:

def count_stuff(id, filename):
    return (some_numerical_calc(filename), size_of_file(filename))

for id in ids_list:
    nums = 0
    sizes = 0
    for f in files_list:
        num, size = count_stuff(id, f)
        nums += num
        sizes += sizes
    result = { id: (nums, sizes) }

Now your dictionary is created after you've aggregated your data.

score 0 · Answer 2 · edited May 23 '17 at 11:52

To create a tuple that contains sums, you can do something like this*:

result[id] = (sum(some_numerical_calc(filename) for filename in files_list),
              sum(size_of_file(filename) for filename in files_list))

But just as a heads up, using your current code, this would store the same tuple value in the dict for all of your id keys. You don't currently have any way of associating a particular files_list with a particular id.

*If you only want to iterate once through files_list instead of twice, you can adapt one of the answers from here: Python element-wise tuple operations like sum.

Kal Zekdor · Answer 3 · 2016-06-17T22:04:08.143

0

You could try mixing map and sum:

resultDict = {}

for id in ids_list:
    resultDict[id] = (sum(map(some_numerical_calc(files_list[id]))), sum(map(size_of_file(files_list[id]))))

Edit:

A more detailed example given your particular situation. Some parts will be described in comments in angle brackets.

#<Get list of ids as strings>
files_list = {} #Initialize the files dictionary.

for id in ids_list:
    #<Switch to directory based on id>
    files_list[id] = [filename for filename in glob('*' + .txt)]

def some_numerical_calc(filename):
    # calculates and returns some number as string
def size_of_file(filename):
    # calculates and returns size number as string

result_dict = {} #Init results.

for id in ids_list:
    resultDict[id] = (sum(map(some_numerical_calc(files_list[id]))), sum(map(size_of_file(files_list[id]))))

edited Jun 17 '16 at 22:04

answered Jun 17 '16 at 20:09

Kal Zekdor

1,172
2
13
23

Trying to understand your answer which I find fascinating as an idea. I do not get the `files_list[id]` part. I mean, what will a list return when provided an index which happens to be the key of an unrelated dictionary? Perhaps if you could add a working example? – Jun 17 '16 at 21:41
Your explanation seemed to indicate that the files_list was a two-dimensional data set. That is, each id had its own set of files. So, files list would look something like: `files_list = {'001': ['file1.txt', 'file2.txt', 'file3.txt'], '002': ['file4.txt', 'file5.txt'], '003': ['file6.txt']}`. So, `files_list[id]` would yield the list of files associated with the given id. – Kal Zekdor Jun 17 '16 at 21:46
Apologies if I did not make clearer; no the `files_list` is list of filenames for a specific `id`. Every `id` has its own `files_list`. Like the contents of a folder. Do you think this elegant `sum` & `map` combo could be implemented now? – Jun 17 '16 at 21:49
Can you edit your question to include how files_list is populated? – Kal Zekdor Jun 17 '16 at 21:51
Your code uses `for id in ids_list: for f in files_list:` That uses the same `files_list` variable for every id, so I assumed it was a two-dimensional data set. – Kal Zekdor Jun 17 '16 at 21:53
I see what caused the confusion. I edited the question to (hopefully) provide what was missing. – Jun 17 '16 at 21:55
It took me a while but it does make sense. Thanks for adapting it to demonstrate the use of `map`, even though changing the `files_list` is not an option for me, it is still a very good example. – Jun 18 '16 at 09:02

score 0 · Answer 4 · answered Jun 17 '16 at 20:09

There are a few things strange about your code, for one, you are creating a new dictionary every time you call count_stuff, but never doing anything with it or return it. From the question it seems like you want everything added to one dictionary.

Something like this might work better:

def some_numerical_calc(filename):
    # calculates and returns some number as string
def size_of_file(filename):
    # calculates and returns size number as string

def count_stuff(id, file_list):
    some_number = 0
    size = 0
    for filename in file_list:
        some_number += some_numerical_calc(filename)
        size += size_of_file(filename)
    return (some_number, size) 

results = {}
for id in ids_list:
    results[id] = count_stuff(id, file_list))
print results

You are right. Failed to clarify that; I intend to have all dictionary under on dictionary which will hold `id` as keys and the numbers as values. (modified question to mention this) — , Jun 17 '16 at 20:11

Pre-calculating the values of a dictionary with tuple as its value

4 Answers4