I have a file (input.txt) containing half-a-million lines, and I want to encrypt these lines with my encrypt
function, and save them to one single file called output.txt
. For example the input.txt
is
aab
abb
abc
Then I want to have my output.txt
to be
001
011
012
Simple for loop version
I have a working for
loop, however it takes nearly 9 hours to encrypt all the lines:
encryption_map = {}
encryption_map['a']=0
encryption_map['b']=1
encryption_map['c']=2
def encrypt(input_str):
output_int = ''
for i in input_str:
for ch in i.split('\n')[0]: # remove line break symbol \n
output_int += str(encryption_map[ch])
return output_int
text_path = 'input.txt'
with open(text_path, 'r') as input_file:
lines = input_file.readlines()
with open('output.txt', 'w') as output_file:
for l in lines:
output_int = encrypt(l)
output_file.write(output_int + '\n')
apply_async
version
Since I want to keep the same ordering, in the output.txt
, it seems I have to use apply_async
. Then my code becomes:
import multiprocessing as mp
encryption_map = {}
encryption_map['a']=0
encryption_map['b']=1
encryption_map['c']=2
def encrypt(input_str):
output_int = ''
for i in input_str:
for ch in i.split('\n')[0]: # remove line break symbol \n
output_int += str(encryption_map[ch])
return output_int
def write_result(output):
output_file.write(ipa_output + '\n')
# output_file.flush() # This line is suggested by another stack question
pool = mp.Pool(20)
text_path = 'input.txt'
with open(text_path, 'r') as input_file:
lines = input_file.readlines()
with open('output.txt', 'w') as output_file:
for l in lines:
pool.apply_async(encrypt, args=l, callback=write_result)
pool.close()
pool.join()
It runs much faster, however, the output.txt is always empty. What's wrong with my code? I found one post that also has difficulty in writing out the file, and they suggest us to put f.flush()
inside the write function, but it also doesn't work.