Creating tuples doesn't seem to be especially slow
I don't think tuple creation is the performance bottleneck in this question. (Profiled on a run reading 6.2MB sized text file)
Code:
import cProfile
def to_tuple(l):
return tuple(l)
with open('input.txt', 'r') as f:
lines = f.readlines()
cProfile.run("lines = [to_tuple(line.strip().split()) for line in lines]")
Profile result:
time python3 tuple-perf.py
385375 function calls in 0.167 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.072 0.072 0.165 0.165 <string>:1(<listcomp>)
1 0.002 0.002 0.166 0.166 <string>:1(<module>)
128457 0.017 0.000 0.017 0.000 tuple-perf.py:5(to_tuple)
1 0.000 0.000 0.167 0.167 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
128457 0.062 0.000 0.062 0.000 {method 'split' of 'str' objects}
128457 0.013 0.000 0.013 0.000 {method 'strip' of 'str' objects}
If your observation on the profiling result is different you could edit the answer to add more details.
Possible solutions
- Use generator
iter_lines = (tuple(line.strip().split()) for line in lines)
This is useful if you could process lines asynchronously. For example, if you need to send one API request per line or publish them to a message queue so that the lines could be processed by another process, using generator could let you pipeline the workload instead of having to wait for all lines to be processed first.
However, if you need all lines at once as input data for the next step in data processing, it's not gonna help much.
- Use another fast language to process that part of data
If you need a list with complete data and still needs to squeeze every bit of performance, your best bet is to use another faster language to process the part.
However, I would strongly recommend you do some detailed profiling first. All performance optimization starts from profiling, otherwise it's very easy to make the wrong call and spend effort on something that doesn't really improve performance much.