0

I am kind of working on text processing, suppose that i have one document and use it to compare with many other document. I call the first document with txt and other with pat.

this is my main procedure

#read the document
txt = doc_gettext()

#read filename of other documents
filenames = doc.get_pat()

# iteration
d = int((len(txt) - 5 + 1) / k)

for i in range(1, len(filenames)):
    # open pattern one by one through the loop by name
    patname = filenames[i].replace('\n', '')
    with open (patname, 'r') as pattern:
        pattern = pattern.read().replace('\n', ' ').replace('\t', ' ')

    pattern = pattern.split()

    for j in range(k - 1):
        p = Process(target=all_position, args=(int(j * d), int((j+1) * d) + 5 - 1, pattern, txt, i, R,)) 
        processes.append(p)
        p.start()

    p = Process(target=all_position, args=(int(d * (k-1)), len(txt) + 5 - 1, pattern, txt, i, R,)) 
    processes.append(p)
    p.start()

    for pr in processes:
        pr.join()

and i try to print them here, because i want to do some algorithm later on,

def all_position(x, y, pat, txt, i, R):
        #print pat
        print txt
        #print R.put(pat)


if __name__ == '__main__':
    main()

suppose i saved my txt on list with token length = 20, and want to print them on procedure all_position, the output is :

['pe[[n''sppieelnn'ss, ii'llb''a, , k''abbraa'kk, aar'r'a', l, 'a'asal'la, as's'
r', a, 'm'rbrauamtmb'b, uu'ttt''a, , n''gttaaannn'gg, aa'nnm''a, , k''ammnaa'kk,
 aa'nnl''e, , m''allreeimm'aa, rr'iil''a, , n''tllaaainn'tt, aa'iis''e, , n''dss
aeelnn'dd, aa'llk''a, , k''ik'ka, ak'kiki'u', k, 'u'k'ku, uk'kupu'i', n, 't'pupi
'i, nn'ttpuue''l, , a''nppgeeill'aa, nn'ggmiii''n, , u''mmm'ii, nn'uummme''j, ,
a'''mm, ee'jjbaau''k, , u'''bb, uu'kkbuua''j, , 'ub''ab, ja'ujc'ue, l'acneal'a,
n''a, p'', lc'aespltlaianksa't', i, 'k'k'pe, lr'atksaetsir'k]t
'a, s''k]e
rtas']
['pensil', 'bakar', 'alas', 'rambut['', p'etnasnigla'n, '', b'amkaakra'[n, '''p,
 ae'lnlasesim'la, 'r', ir''ab, ma'bkluaatrn''t, , a''ita'al, na'gssa'en, n''d, r
a'almm'ab, ku'atkn'a', k, 'i't'la, en'mgkaaurnki'u', ', ', 'm'lapakinantnta'ui,
''', , l''epsmeealnradina'gl, i''', l, 'a'knmatikaniiu''m, , ''', ks'uemkneudj'a
a, l''', p, 'i'bnkutakuku'i', ', ', 'p'bekalujakunu'g', i, '''c, pe'ilmnaitnnuau
''m, , ''', pp'elmlaeasjntagi'ik, ''', , b''umkkieunr'ut, ma''sb, 'a']jm
ue'j, a''c, e'lbaunkau'', , ''bapjlua's, t'icke'l, a'nkae'r, t'apsl'a]s
tik', 'kertas']

Why something like this happen? This is very confusing me. Can somebody please help me to fix this?

lloistborn
  • 353
  • 2
  • 7
  • 23
  • possible duplicate of [Python multithreaded print statements delayed until all threads complete execution](http://stackoverflow.com/questions/18234469/python-multithreaded-print-statements-delayed-until-all-threads-complete-executi) – BartoszKP Dec 16 '14 at 21:01
  • You have multiple processes trying to print to the same terminal without any form for coordination... what would you expect to happen? – thebjorn Dec 16 '14 at 21:01
  • possible duplicate of [How should I log while using multiprocessing in Python?](http://stackoverflow.com/questions/641420/how-should-i-log-while-using-multiprocessing-in-python) Essentially, all your processes are trying to print to the same stream at once. This naturally results in their outputs interleaving. You have to do something more robust if you're going to print output from multiple processes. – Henry Keiter Dec 16 '14 at 21:03
  • Actually i don't have deep knowledge about multiprocessing sir. It's just i want to check the same string on the allposition procedure after i pass it through the Process argument. Please tell me about anything that i should to know @thebjorn – lloistborn Dec 16 '14 at 21:07
  • I just read the article that you attach. But it's still a bit confusing, the string output is fine when i implement it with Queue or Pipe, but i need to pass it back to the main procedure. I just want to process it on allposition procedure but it return something different from my string @HenryKeiter – lloistborn Dec 16 '14 at 21:16
  • This comment section is much too small for a tutorial on process synchronization, it's a complex topic. I would suggest having each process write to its own file. – thebjorn Dec 16 '14 at 21:38

1 Answers1

2

If you need safe printing you can use Lock objects.

Let's look at some code...

Not safe:

from multiprocessing import Lock, Process
import sys

# NOT SAFE
def not_safe_print(x):
    for i in range(10):
        # problem!
        print range(20)

# pool of 10 workers
processes = []
for i in range(10):
    processes.append(Process(target=not_safe_print, args=(i,)))

for p in processes:
    p.start()

for p in processes:
    p.join()

As we can see, two processes can be on the print statement at the same time. This is not "safe".

Suppose we have two processes (numbered 1 and 2) that run a single instruction each time the scheduler gives them some time to run. The processes will end up writing only some of the lists they intend to write to stdout. Then, the system will flush the stdout buffer and mangled output will show.

Hopefully when you run this script (you may have to run it a few times) - you'll see mangled text like in your program.

Making it "safe":

To make the script safe we have to limit access to shared resources like the stdout buffer (what you end up seeing on the terminal - could be a file as well). This is also called mutual exclusion. To do that we can use the Lock objects that provide means to solve the problem of mutual exclusion.

# used to implement a SAFE print
lock = Lock()
def safe_print(x):
    # when a process reaches this point it acquires the lock.
    # none goes in without the lock - only a single process can pass
    lock.acquire()
    for i in range(10):
        print range(20)
    # when the process is done it releases the lock for other processes to grab
    # meaning another process can now use stdout (used by print...)
    lock.release()

Don't forget to change this line:

    processes.append(Process(target=safe_print, args=(i,)))
Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88
  • 1
    Wow, what you explain is very awesome and just solve my problem. Also you give me some new basic knowledge about that mutual exclusion. thank you sir. accepted @ReutSharabani – lloistborn Dec 16 '14 at 23:48