0

I am trying to manipulate the lists inside the dictionary clean_txt in another function, but its not working and I end up with empty lists inside the dict.

My understading is that both lists and dicts are mutable objects so what is the problem here?

def process_questions(i, question_list, questions, question_list_name):
    ''' Transform questions and display progress '''
    print('processing {}: process {}'.format(question_list_name, i))
    for question in questions:
        question_list.append(text_to_wordlist(str(question)))

@timeit
def multi(n_cores, tq, qln):
    procs = []
    clean_txt = {}
    for i in range(n_cores):
        clean_txt[i] = []

    for index in range(n_cores):
        tq_indexed = tq[index*len(tq)//n_cores:(index+1)*len(tq)//n_cores]
        proc = Process(target=process_questions, args=(index, clean_txt[index], tq_indexed, qln, ))
        procs.append(proc)
        proc.start()

    for proc in procs:
        proc.join()

    print('{} records processed from {}'.format(sum([len(x) for x in clean_txt.values()]), qln))
    print('-'*100)
user3804483
  • 117
  • 1
  • 2
  • 8

2 Answers2

1

Your are using Processes not threads.

When the process is created the memory of your program is copied and each process work on its own set, therefore it is NOT shared.

Here's a question that can help you understand: Multiprocessing vs Threading Python

If you want to share memory between processes you should look into semaphores or use Threads instead. There are also other solutions to share data, like queues or database etc.

Community
  • 1
  • 1
Maresh
  • 4,644
  • 25
  • 30
1

You are appending to clean_txt[index] from a different Process. clean_txt[index] belongs to the main python process who created it. Since a process can't access or modify another process's memory, you can't append to it. (Not really. See edit below)

You will need to create shared memory.

You can use Manager to create shared memory, something like this

from multiprocessing import Manager
manager = Manager()
...
    clean_txt[i] = manager.list()

Now you can append to this list in the other process.

Edit -

My explanation about clean_txt wasn't clear. Thanks to @Maresh.

When a new Process is created the whole memory is copied. So modifying the list in the new process won't affect the copy in the main process. So you need a shared memory.

Shreyash S Sarnayak
  • 2,309
  • 19
  • 23