4

I have two lists:

l1 = ['a','b','c','d','e','f','g','h', ...]

l2 = ['dict1','dict2','dict3','dict4','dict5','dict6','dict7','dict8', ...]

I need to run a function on a chunk of each list 50 items at a time, and it can't continue until the function has returned the result of the first 50 items in each list.

My first idea was using a generator:

def list1_split(l1):
    n = 50
    for i in range(0, len(l1), n):
        yield l1[i:i+n]

def list2_split(l2):
        n = 50
        for i in range(0, len(l2), n):
            yield l2[i:i+n]

chunk_l1 = list1_split(l1)
chunk_l2 = list1_split(l1)

Then when using both lists I place them in the main function:

def query(chunk_l1, chunk_l2):

    count = 0
    query_return_dict = {}

    for i, j in zip(chunk_l2, chunk_l1):
        count += 1
        query_return_dict[i] = j
        print('This is number ', count, '\n', j)

    return query_return_dict


def main():
    thread = threading.Thread(target=query(chunk_l1, chunk_l2))
    thread.start()

    print('Done!')

if __name__ == '__main__':
    main()

My first error that I get, is unrelated to the generators (I think):

TypeError: 'dict' object is not callable

But what really threw me off was when I used the debugger my for loop was interpreting each list as:

i: <class 'list'>: ['a','b','c','d','e',...]
j: <class 'list'>: ['dict1','dict2','dict3','dict4',...]

Instead of i: 'a', j: 'dict1', on top of that I get an error saying,

TypeError: unhashable type: 'list'

I'm not too familiar with generators but it seems the most useful for running functions a chunk at a time

Sebastian Goslin
  • 477
  • 1
  • 3
  • 22
  • which line do you get the error `TypeError: 'dict' object is not callable` ? – Devesh Kumar Singh Jun 12 '19 at 19:33
  • I initially used it when was using a generator on `chunk_l1 = list1_split(l1)` while keeping `l2` as a regular list, so I created two generators to see if that would fix it, then whenI ran the debugger to see what it was doing I noticed how it was interpreting the generator lists which threw me off. – Sebastian Goslin Jun 12 '19 at 19:35
  • 1
    For starters, you are passing `None` to `threading.Thread(target=query(chunk_l1, chunk_l2))`, since `query` *always* returns `None`... but you need to provide a [mcve]. – juanpa.arrivillaga Jun 12 '19 at 19:38
  • Good call, same error though, I'll update the post – Sebastian Goslin Jun 12 '19 at 19:40
  • https://stackoverflow.com/questions/11792629/thread-starts-running-before-calling-thread-start – user2357112 Jun 12 '19 at 19:44
  • The `target` parameter must be a function. You're calling the `query()` function and passing its value (a dictionary), not passing the function itself. – Barmar Jun 12 '19 at 19:45
  • @SebastianGoslin ok, so now you are passing it a `dict` object when it expects a callable object (e.g. a function). Hence the error, `TypeError: 'dict' object is not callable`. – juanpa.arrivillaga Jun 12 '19 at 19:48
  • 1
    You're not actually processing anything in chunks. The `query()` function goes through all the chunks and puts everything into the `query_return_dict`. What's the point of the chunking? – Barmar Jun 12 '19 at 19:48

1 Answers1

3

To begin with, i and j are not strings as you might have thought, but they are list themselves.

When you do query_return_dict[i] you get the error TypeError: unhashable type: 'list' since you are trying to use a list as a dictionary key, but you cannot do that, since lists are mutable and hence unhashable, and dictionary keys should always be hashable

To extract the strings out of the list, you need another for loop, which iterates over i and j and creates your query_return_dict

def query(chunk_l1, chunk_l2):

    query_return_dict = {}

    #i and j are chunks
    for i, j in zip(chunk_l1, chunk_l2):
        #Zip chunks together to extract individual elements
        for key, value in zip(i, j):
            #Create your dictionary
            query_return_dict[key] = value

Also thread = threading.Thread(target=query(chunk_l1, chunk_l2)) is not how you pass a function as a target to the thread, Instead you would want to do

thread = threading.Thread(target=query, args=(chunk_l1, chunk_l2))

From the docs: https://docs.python.org/3/library/threading.html#threading.Thread

target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.
args is the argument tuple for the target invocation. Defaults to ().

Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40