I want to complete the following task:
I have a "input" tsv file:
0 2 0
2 5 1
5 10 2
10 14 5
And i want to convert it in the following format:
0
0
1
1
1
2
2
2
2
2
5
5
5
5
I manage to do this with the following code: (Start is the first column of input file, stop is the second and depth is the third.)
def parse(i):
out = []
start = int(i[0])
stop = int(i[1])
depth = i[2]
times = stop - start
out += times * [depth]
return(out)
signal = []
for i in tqdm(file):
x = parse(i)
signal.append(x)
with open('output.txt', 'w') as f:
for item in signal[0]:
f.write("%s\n" % item)
Although my input file has 16720973 lines and i have many files of those so i tried to make parallel processes to minimize execution time with the following code:
def parse(start, stop, depth):
out = []
times = int(stop) - int(start)
out += times * [depth]
return(out)
signal = []
poolv = multip.Pool(20)
x = [poolv.apply(parse, args=(i[0], i[1], i[2])) for i in tqdm(file)]
signal.append(x)
poolv.close()
But there was no difference in execution time and i think no multi process took place. Is there any mistake or a better way to solve this problem in order to minimize execution time?