Let's promote the Levels of Efficiency :
The most efficient ( if not forbidden otherwise ) would be to remove "\n"
-instances beforehand ( using smart and efficient O/S-tools ) and next process the "rest" of the file-I/O ( where python internally, by definition, appends "\n" again, once used in aFileINPUT
-iterator, as noted in documentation, irrespective of os.filesep == { "\n" | "\r\n" | "\r" | ... }
that was actually used for a "line"-separation step, on the iterator input-stream ).
Let's measure the Levels of Efficiency - by decoding the actual flow of operations :
On using map( lambda )
:
############################################################# EFFICIENCY LIMITS :
# - pure-[SERIAL]
# - local-GIL-lock
# - local-CPU
# - local-RAM-I/O :
>>> def a_map_lambda_loop( aFileINPUT ):
... for line in map( lambda s: s.strip( "\n" ), aFileINPUT ):
... do_something( line )
>>> dis.dis( a_map_lambda_loop )
2 0 SETUP_LOOP 36 (to 39)
3 LOAD_GLOBAL 0 (map)
6 LOAD_CONST 1 (<code object <lambda> at 0x7ff8fee7b930, file "<stdin>", line 2>)
9 MAKE_FUNCTION 0
12 LOAD_FAST 0 (aFileINPUT)
15 CALL_FUNCTION 2
18 GET_ITER
>> 19 FOR_ITER 16 (to 38)
22 STORE_FAST 1 (line)
3 25 LOAD_GLOBAL 1 (do_something)
28 LOAD_FAST 1 (line)
31 CALL_FUNCTION 1
34 POP_TOP
35 JUMP_ABSOLUTE 19
>> 38 POP_BLOCK
>> 39 LOAD_CONST 0 (None)
42 RETURN_VALUE
On using @chepner-promoted loop
:
############################################################# EFFICIENCY LIMITS :
# - pure-[SERIAL]
# - local-GIL-lock
# - local-CPU
# - local-RAM-I/O :
>>> def a_loop_runner( aFileINPUT ):
... for line in aFileINPUT:
... line = line.strip( "\n" )
... do_something( line )
>>> dis.dis( a_loop_runner )
2 0 SETUP_LOOP 39 (to 42)
3 LOAD_FAST 0 (aFileINPUT)
6 GET_ITER
>> 7 FOR_ITER 31 (to 41)
10 STORE_FAST 1 (line)
3 13 LOAD_FAST 1 (line)
16 LOAD_ATTR 0 (strip)
19 LOAD_CONST 1 ('\n')
22 CALL_FUNCTION 1
25 STORE_FAST 1 (line)
4 28 LOAD_GLOBAL 1 (do_something)
31 LOAD_FAST 1 (line)
34 CALL_FUNCTION 1
37 POP_TOP
38 JUMP_ABSOLUTE 7
>> 41 POP_BLOCK
>> 42 LOAD_CONST 0 (None)
45 RETURN_VALUE
On using methodcaller()
:
############################################################# EFFICIENCY LIMITS :
# - pure-[SERIAL]
# - local-GIL-lock
# - local-CPU
# - local-RAM-I/O :
>>> def a_methodcaller_loop( aFileINPUT ):
... for line in map( methodcaller( "strip", "\n" ), aFileINPUT ):
... do_something( line )
>>> dis.dis( a_methodcaller_loop )
2 0 SETUP_LOOP 42 (to 45)
3 LOAD_GLOBAL 0 (map)
6 LOAD_GLOBAL 1 (methodcaller)
9 LOAD_CONST 1 ('strip')
12 LOAD_CONST 2 ('\n')
15 CALL_FUNCTION 2
18 LOAD_FAST 0 (aFileINPUT)
21 CALL_FUNCTION 2
24 GET_ITER
>> 25 FOR_ITER 16 (to 44)
28 STORE_FAST 1 (line)
3 31 LOAD_GLOBAL 2 (do_something)
34 LOAD_FAST 1 (line)
37 CALL_FUNCTION 1
40 POP_TOP
41 JUMP_ABSOLUTE 25
>> 44 POP_BLOCK
>> 45 LOAD_CONST 0 (None)
48 RETURN_VALUE
On using an ALAP .strip()
call, if the .strip()
was not possible to get deferred into the do_something()
, and possibly distributed, for getting even higher efficiency of processing - { pure-[SERIAL]
| just-[CONCURRENT]
}, { local | independent }-GIL-lock(s), { local | distributed }-CPU, { local | distributed }-RAM-I/O:
############################################################# EFFICIENCY LIMITS :
# - pure-[SERIAL] |+ just-[CONCURRENT]
# - local-GIL-lock|+ independent-GIL-lock
# - local-CPU |+ independent-CPUs
# - local-RAM-I/O |+ independent-RAM-I/O
>>> def ALAP_runner( aFileINPUT ):
... for line in aFileINPUT:
... do_something( line.strip( "\n" ) )
>>> dis.dis( ALAP_runner )
2 0 SETUP_LOOP 33 (to 36)
3 LOAD_FAST 0 (aFileINPUT)
6 GET_ITER
>> 7 FOR_ITER 25 (to 35)
10 STORE_FAST 1 (line)
3 13 LOAD_GLOBAL 0 (do_something)
16 LOAD_FAST 1 (line)
19 LOAD_ATTR 1 (strip)
22 LOAD_CONST 1 ('\n')
25 CALL_FUNCTION 1
28 CALL_FUNCTION 1
31 POP_TOP
32 JUMP_ABSOLUTE 7
>> 35 POP_BLOCK
>> 36 LOAD_CONST 0 (None)
39 RETURN_VALUE
More details are heavily dependent on the nature of the do_something()
and the actual overhead-strict re-formulated Amdahl's Law costs (see all the add-on overhead costs and add to that also the process-communication costs ( pickle{ .dumps() | .loads() }
-based SER/DES costs and IPC-{ channel | network }-communication latencies ), if going from a pure-[SERIAL]
to a just-[CONCURRENT]
, the more if { process | node }
-distributed.
On list
-comprehension with an if
-based member-allocator -pure-[SERIAL]
, local-GIL-lock, local-CPU, local-RAM-I/O ( awfully un-protected from on-the-fly syntax-constructors' un-salvageable memory-allocation MemoryError
crashes ):
############################################################# EFFICIENCY LIMITS :
# - pure-[SERIAL]
# - local-GIL-lock
# - local-CPU
# - local-RAM-I/O :
>>> def anOnTheFlyGrowingListComprehension( self ):
... res = [x for x in list(set(self.database.values())) if x.startswith(text)]
>>> dis.dis( anOnTheFlyGrowingListComprehension )
2 0 BUILD_LIST 0
3 LOAD_GLOBAL 0 (list)
6 LOAD_GLOBAL 1 (set)
9 LOAD_FAST 0 (self)
12 LOAD_ATTR 2 (database)
15 LOAD_ATTR 3 (values)
18 CALL_FUNCTION 0
21 CALL_FUNCTION 1
24 CALL_FUNCTION 1
27 GET_ITER
>> 28 FOR_ITER 27 (to 58)
31 STORE_FAST 1 (x)
34 LOAD_FAST 1 (x)
37 LOAD_ATTR 4 (startswith)
40 LOAD_GLOBAL 5 (text)
43 CALL_FUNCTION 1
46 POP_JUMP_IF_FALSE 28
49 LOAD_FAST 1 (x)
52 LIST_APPEND 2
55 JUMP_ABSOLUTE 28
>> 58 STORE_FAST 2 (results)
61 LOAD_CONST 0 (None)
64 RETURN_VALUE
or
yet another, closer view on iterator-formulated pure-[SERIAL]
"front"-end .strip()
-er:
############################################################# EFFICIENCY LIMITS :
# - pure-[SERIAL]
# - local-GIL-lock
# - local-CPU
# - local-RAM-I/O :
>>> dis.dis( '( do_something( line.strip( "\n" ) ) for line in aFileINPUT )' )
0 STORE_SLICE+0
1 SLICE+2
2 LOAD_CONST 24431 (24431)
5 POP_JUMP_IF_TRUE 28015
8 LOAD_NAME 26740 (26740)
11 BUILD_MAP 26478
14 STORE_SLICE+0
15 SLICE+2
16 IMPORT_NAME 28265 (28265)
19 LOAD_NAME 29486 (29486)
22 LOAD_GLOBAL 26994 (26994)
25 JUMP_IF_TRUE_OR_POP 8232
28 <34>
29 UNARY_POSITIVE
30 <34>
31 SLICE+2
32 STORE_SLICE+1
33 SLICE+2
34 STORE_SLICE+1
35 SLICE+2
36 BUILD_TUPLE 29295
39 SLICE+2
40 IMPORT_NAME 28265 (28265)
43 LOAD_NAME 26912 (26912)
46 JUMP_FORWARD 24864 (to 24913)
49 PRINT_EXPR
50 BUILD_MAP 25964
53 PRINT_ITEM_TO
54 INPLACE_XOR
55 BREAK_LOOP
56 EXEC_STMT
57 IMPORT_STAR
58 SLICE+2
59 STORE_SLICE+1