11

What is the cost (if any) of calling the str function on an object that is already a string? The use case here is to normalize an array of objects of different types and convert them into string, naively it can be implemented like so:

def arr_2_strarr(arr):
    return [str(val) for val in arr]

But if the str() causes too much overhead, and my arr contains primarily strings, I may consider using:

def arr_2_strarr2(arr):
    return [str(val) if not isinstance(val, basestring) else val for val in arr]

Any suggestions?

martineau
  • 119,623
  • 25
  • 170
  • 301
benjaminz
  • 3,118
  • 3
  • 35
  • 47
  • 3
    It's pretty cheap: it just returns the original string object. Calling `isinstance` explicitly will definitely be slower. – PM 2Ring Jun 08 '17 at 15:03
  • 1
    and you would have to replace it with an `if` statement checking for *not strings* which is also not for free. – Ma0 Jun 08 '17 at 15:04
  • 1
    You can always use ipython's [%timeit](http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) to chose the best suiting solution. I have found an old but well written [article](http://pynash.org/2013/03/06/timing-and-profiling/) it may help. – raratiru Jun 08 '17 at 15:16
  • 2
    It certainly is cheaper than checking for the instance type... – zwer Jun 08 '17 at 15:20
  • Possible duplicate of [Should I avoid converting to a string if a value is already a string?](https://stackoverflow.com/questions/42223125/should-i-avoid-converting-to-a-string-if-a-value-is-already-a-string) – Evan Carroll May 10 '18 at 21:16

2 Answers2

20

Calling str on a string object is pretty cheap: it just returns the original string object. Calling isinstance explicitly will definitely be slower.

If you want to test this on real data, take a look at the timeit module.

BTW, you should eliminate the not from your 2nd version

[val if isinstance(val, basestring) else str(val) for val in arr]

And you can speed things up slightly by caching str:

def arr_2_strarr(arr, str=str):
    return [str(val) for val in arr]

Happy micro-optimizing. :)


Why cache str? Well, each time you use a name, Python has to look for it. If you're inside a function, first it looks in the local namespace, and if it can't find the name then it looks in the globals. Even though str is built-in, it still "lives" in the global namespace; it would be inefficient to "import" all the built-ins into every function. By doing

def arr_2_strarr(arr, str=str)

we create a local name str that gets bound to the built-in str type, and because it's a default argument that search & bind process happens once, when the function definition is executed, not each time the function is called.

So each time we call arr_2_strarr the interpreter will immediately find that local str, which will save a tiny amount of time.


Here's some timeit code that compares the various strategies. It runs on both Python 2 & Python 3, although on Python 3 it substitutes str for basestr, since basestr doesn't exist in Python 3.

This code runs the functions on lists of various sizes first with integer data, then with string data which is created by converting the integer data to strings.

Each line of output gives the time to perform the given number of loops over 3 repetitions, sorted from fastest to slowest. As the timeit repeat docs mention, the main number to look at in each run is the smallest one.

The results for all functions on a given list size and type are also sorted from fastest to slowest.

''' Compare the speeds of direct string conversion
    with testing first via isinstance

    See https://stackoverflow.com/q/44439323/4014959

    Written by PM 2Ring 2017.06.09

    Python 2 / 3 compatible
'''

from __future__ import print_function, division
from timeit import Timer
import sys

# Python 3 doesn't have basestring
if sys.version_info[0] > 2:
    basestring = str

# The functions to test
def plain(arr):
    return [str(val) for val in arr]

def cached(arr, str=str):
    return [str(val) for val in arr]

def teststr(arr):
    return [val if isinstance(val, str) else str(val) for val in arr]

def testbase(arr):
    return [val if isinstance(val, basestring) else str(val) for val in arr]

def testbasenot(arr):
    return [str(val) if not isinstance(val, basestring) else val for val in arr]

funcs = (
    plain,
    cached,
    teststr,
    testbase,
    testbasenot,
)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

def verify(arr):
    results = [func(arr) for func in funcs]
    first, results = results[0], results[1:]
    return all(first == u for u in results)

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import arr, ' + fname
        cmd = fname + '(arr)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:12} {1}'.format(fname, result))

# Check that all functions return the same results
if 0:
    print('Testing all functions')
    arr = list(range(10))
    print(arr, verify(arr))
    arr = list('abcdefghij')
    print(arr, verify(arr))

# Do the timing tests
reps = 3
loops = 1 << 16
for i in range(1, 11):
    n = 1 << i
    # Build a data array of integers
    arr = range(n)
    print('\n{0}: Size={1}, Loops={2}'.format(i, n, loops))
    print('* Integer')
    time_test(loops, reps)

    # Convert the data array contents to strings
    arr = cached(arr)
    print('\n* String')
    time_test(loops, reps)
    loops >>= 1    

typical Python 2 output

1: Size=2, Loops=65536
* Integer
cached       [0.17268610000610352, 0.19634914398193359, 0.2058720588684082]
plain        [0.17906594276428223, 0.18797492980957031, 0.24009895324707031]
teststr      [0.32513308525085449, 0.33270597457885742, 0.35080599784851074]
testbasenot  [0.32793092727661133, 0.33176803588867188, 0.33498501777648926]
testbase     [0.32964491844177246, 0.33154511451721191, 0.33760714530944824]

* String
cached       [0.1619560718536377, 0.1628870964050293, 0.16448402404785156]
teststr      [0.16335082054138184, 0.16484308242797852, 0.17012500762939453]
plain        [0.16956901550292969, 0.1711430549621582, 0.18457293510437012]
testbase     [0.22378706932067871, 0.2255101203918457, 0.22593879699707031]
testbasenot  [0.22855901718139648, 0.22941207885742188, 0.23271608352661133]

2: Size=4, Loops=32768
* Integer
cached       [0.12796807289123535, 0.12807202339172363, 0.12817001342773438]
plain        [0.13622713088989258, 0.14297294616699219, 0.14868402481079102]
teststr      [0.27701020240783691, 0.27812099456787109, 0.2795259952545166]
testbasenot  [0.27815794944763184, 0.28220701217651367, 0.29373884201049805]
testbase     [0.2804868221282959, 0.28186416625976562, 0.31699705123901367]

* String
cached       [0.12131500244140625, 0.12241697311401367, 0.13379192352294922]
teststr      [0.12839889526367188, 0.1314079761505127, 0.14053797721862793]
plain        [0.13051795959472656, 0.14696002006530762, 0.18504786491394043]
testbase     [0.18404412269592285, 0.1844489574432373, 0.19633579254150391]
testbasenot  [0.18416285514831543, 0.18494606018066406, 0.18553614616394043]

3: Size=8, Loops=16384
* Integer
cached       [0.10957002639770508, 0.11252093315124512, 0.11768913269042969]
plain        [0.11848998069763184, 0.11958003044128418, 0.1292269229888916]
testbase     [0.26231694221496582, 0.26471304893493652, 0.26625895500183105]
teststr      [0.26410102844238281, 0.2641758918762207, 0.26569199562072754]
testbasenot  [0.26910495758056641, 0.26967120170593262, 0.2741539478302002]

* String
cached       [0.102294921875, 0.10357999801635742, 0.1050269603729248]
teststr      [0.10852217674255371, 0.10861611366271973, 0.1127161979675293]
plain        [0.11173510551452637, 0.11183404922485352, 0.12115597724914551]
testbasenot  [0.16488981246948242, 0.16509699821472168, 0.16648602485656738]
testbase     [0.16622614860534668, 0.16688108444213867, 0.16962814331054688]

4: Size=16, Loops=8192
* Integer
cached       [0.10548806190490723, 0.10568594932556152, 0.10611891746520996]
plain        [0.11526799201965332, 0.1160120964050293, 0.12486004829406738]
teststr      [0.25309896469116211, 0.25549888610839844, 0.25838899612426758]
testbasenot  [0.25410699844360352, 0.27252411842346191, 0.32510590553283691]
testbase     [0.25414609909057617, 0.26968812942504883, 0.27393984794616699]

* String
cached       [0.092885017395019531, 0.096045970916748047, 0.10643196105957031]
teststr      [0.098433017730712891, 0.098783016204833984, 0.10051798820495605]
plain        [0.10081005096435547, 0.10222005844116211, 0.12018895149230957]
testbasenot  [0.15373396873474121, 0.15472292900085449, 0.15676999092102051]
testbase     [0.15490198135375977, 0.15572404861450195, 0.15599799156188965]

5: Size=32, Loops=4096
* Integer
cached       [0.10568094253540039, 0.10743498802185059, 0.1115870475769043]
plain        [0.1163330078125, 0.11633419990539551, 0.12796401977539062]
teststr      [0.25122308731079102, 0.26527810096740723, 0.26579189300537109]
testbase     [0.25309586524963379, 0.25563716888427734, 0.25917816162109375]
testbasenot  [0.25465011596679688, 0.25907588005065918, 0.26110982894897461]

* String
cached       [0.085406064987182617, 0.086378097534179688, 0.08651280403137207]
teststr      [0.092473983764648438, 0.09324193000793457, 0.093439817428588867]
plain        [0.096549034118652344, 0.097501993179321289, 0.10462403297424316]
testbase     [0.14794015884399414, 0.14966106414794922, 0.15016818046569824]
testbasenot  [0.14796280860900879, 0.14940309524536133, 0.15308189392089844]

6: Size=64, Loops=2048
* Integer
cached       [0.10838603973388672, 0.1089630126953125, 0.11129999160766602]
plain        [0.11764693260192871, 0.11851096153259277, 0.12583494186401367]
teststr      [0.2550208568572998, 0.25540995597839355, 0.26316595077514648]
testbase     [0.25723910331726074, 0.25930881500244141, 0.26207089424133301]
testbasenot  [0.25864100456237793, 0.25901007652282715, 0.26875495910644531]

* String
cached       [0.086635112762451172, 0.087384939193725586, 0.099885940551757812]
plain        [0.096493959426879883, 0.12469196319580078, 0.13684391975402832]
teststr      [0.096681118011474609, 0.098448991775512695, 0.10569310188293457]
testbase     [0.14573216438293457, 0.14696693420410156, 0.14700508117675781]
testbasenot  [0.14776277542114258, 0.14852094650268555, 0.15462112426757812]

7: Size=128, Loops=1024
* Integer
cached       [0.10915207862854004, 0.11011981964111328, 0.1127631664276123]
plain        [0.11721491813659668, 0.11830401420593262, 0.1254270076751709]
testbase     [0.25789499282836914, 0.26130795478820801, 0.26179313659667969]
teststr      [0.25840306282043457, 0.25889492034912109, 0.26300287246704102]
testbasenot  [0.26443600654602051, 0.26498103141784668, 0.26691412925720215]

* String
cached       [0.083537101745605469, 0.084954023361206055, 0.086431980133056641]
teststr      [0.091158866882324219, 0.09123992919921875, 0.091590166091918945]
plain        [0.091225862503051758, 0.092115163803100586, 0.099261045455932617]
testbase     [0.14569401741027832, 0.14622306823730469, 0.14650607109069824]
testbasenot  [0.14774990081787109, 0.14930200576782227, 0.15020990371704102]

8: Size=256, Loops=512
* Integer
cached       [0.10824894905090332, 0.10865211486816406, 0.10895800590515137]
plain        [0.11750102043151855, 0.12690877914428711, 0.12890195846557617]
teststr      [0.25457501411437988, 0.25542402267456055, 0.25692200660705566]
testbasenot  [0.25513482093811035, 0.25664496421813965, 0.25999689102172852]
testbase     [0.25680398941040039, 0.25924396514892578, 0.26179695129394531]

* String
cached       [0.080662012100219727, 0.081827878952026367, 0.081900119781494141]
teststr      [0.089673995971679688, 0.097939014434814453, 0.15471792221069336]
plain        [0.094327926635742188, 0.095342159271240234, 0.097375154495239258]
testbasenot  [0.14262199401855469, 0.14278602600097656, 0.14302182197570801]
testbase     [0.14464497566223145, 0.14674210548400879, 0.16207790374755859]

9: Size=512, Loops=256
* Integer
cached       [0.10789299011230469, 0.1092069149017334, 0.110015869140625]
plain        [0.11702799797058105, 0.1181950569152832, 0.12698101997375488]
testbase     [0.25504207611083984, 0.25520896911621094, 0.25734806060791016]
testbasenot  [0.25715017318725586, 0.25747489929199219, 0.25850796699523926]
teststr      [0.25783085823059082, 0.25882315635681152, 0.26154208183288574]

* String
cached       [0.078849077224731445, 0.079813003540039062, 0.084489107131958008]
teststr      [0.086745977401733398, 0.087059974670410156, 0.087485074996948242]
plain        [0.088322877883911133, 0.088804960250854492, 0.097378969192504883]
testbasenot  [0.14128994941711426, 0.14266705513000488, 0.1427910327911377]
testbase     [0.14152097702026367, 0.14231991767883301, 0.14392399787902832]

10: Size=1024, Loops=128
* Integer
cached       [0.10892415046691895, 0.11003899574279785, 0.11008000373840332]
plain        [0.1192779541015625, 0.12048506736755371, 0.12956619262695312]
teststr      [0.25335502624511719, 0.25642204284667969, 0.25892996788024902]
testbase     [0.25525593757629395, 0.25550699234008789, 0.25794696807861328]
testbasenot  [0.25932693481445312, 0.25960803031921387, 0.26134610176086426]

* String
cached       [0.078451156616210938, 0.080369949340820312, 0.080511093139648438]
teststr      [0.084844112396240234, 0.085949897766113281, 0.096578836441040039]
plain        [0.086302042007446289, 0.087638139724731445, 0.096364974975585938]
testbase     [0.14068913459777832, 0.14274501800537109, 0.15559101104736328]
testbasenot  [0.14075493812561035, 0.15553092956542969, 0.19578790664672852]    

typical python3 output

1: Size=2, Loops=65536
* Integer
plain        [0.2957206170030986, 0.2959696320031071, 0.2991539639988332]
cached       [0.3058611470005417, 0.30598287599787, 0.3073535650000849]
testbase     [0.38803433800057974, 0.39307209699836676, 0.393392562000372]
testbasenot  [0.3888578799997049, 0.3951267439988442, 0.42909636100011994]
teststr      [0.41290506400036975, 0.41541150199918775, 0.4488242949992127]

* String
testbase     [0.23906823500146857, 0.23946705200069118, 0.24624350399972172]
testbasenot  [0.24037985899849446, 0.24200722000023234, 0.2462738950016501]
plain        [0.25742501500280923, 0.2644229819998145, 0.26711930600140477]
teststr      [0.2635171010006161, 0.3559218000009423, 0.3784064870014845]
cached       [0.2687887559986848, 0.2711959320004098, 0.38138879500183975]

2: Size=4, Loops=32768
* Integer
cached       [0.21332427200104576, 0.21363574399947538, 0.21528891600246425]
plain        [0.22395663199858973, 0.22762144099760917, 0.23422862100051134]
testbasenot  [0.31939790100295795, 0.32413787499899627, 0.32422161499926005]
testbase     [0.3209382370005187, 0.3213516770010756, 0.3215230670029996]
teststr      [0.3372085839982901, 0.33786465500088525, 0.33847540900023887]

* String
testbasenot  [0.17031173299983493, 0.17143720199965173, 0.17724975699820789]
testbase     [0.170390128998406, 0.17118954800025676, 0.18865150499914307]
cached       [0.18190538799899514, 0.18262020299880533, 0.183105569001782]
plain        [0.18666503399799694, 0.18781541300268145, 0.1955128590016102]
teststr      [0.18973677000030875, 0.19112570400102413, 0.19168143299975782]

3: Size=8, Loops=16384
* Integer
cached       [0.17012267099926248, 0.18160372200145503, 0.2275817529989581]
plain        [0.1890079689983395, 0.1963043950017891, 0.2016476179996971]
testbasenot  [0.28168991999700665, 0.2821743839995179, 0.286649605997809]
testbase     [0.28295213199817226, 0.28760008400058723, 0.2906435440017958]
teststr      [0.2958552290001535, 0.2989299110013235, 0.31747390199961956]

* String
testbase     [0.13354753000021446, 0.13377505199969164, 0.14039257600234123]
cached       [0.1352838150014577, 0.1353432000032626, 0.13798289999976987]
testbasenot  [0.14252334699995117, 0.14301740500013693, 0.1445914210016781]
plain        [0.15130633899752866, 0.15166569000211894, 0.1616801599993778]
teststr      [0.15267008800219628, 0.1545946529986395, 0.15590016200076207]

4: Size=16, Loops=8192
* Integer
cached       [0.144755126999371, 0.14782401300180936, 0.1484048439997423]
plain        [0.1726092749995587, 0.1740606339990336, 0.1815100200001325]
testbase     [0.26685525399807375, 0.27029573199979495, 0.2716258750006091]
testbasenot  [0.2702714350016322, 0.2723204169997189, 0.27288546099953237]
teststr      [0.28515160999813816, 0.28523068700087606, 0.2878553769987775]

* String
cached       [0.11515368599793874, 0.11579233700103941, 0.11688366999806021]
testbase     [0.12178990400207113, 0.13090817400006927, 0.13304468899877975]
testbasenot  [0.13121789299839293, 0.14976675499929115, 0.1521548589989834]
teststr      [0.13410512400150765, 0.1354981399999815, 0.147247362001508]
plain        [0.13691626099898713, 0.1384456069972657, 0.1426525679999031]

5: Size=32, Loops=4096
* Integer
cached       [0.13246865899782279, 0.13320018100057496, 0.134628559997509]
plain        [0.1636957459995756, 0.16763203899972723, 0.1752369269997871]
testbase     [0.26010187700012466, 0.2606812570011243, 0.2647345440018398]
testbasenot  [0.2620696090016281, 0.26230394700178294, 0.26258907899682526]
teststr      [0.27685887300322065, 0.2787095199964824, 0.28293989099984174]

* String
cached       [0.10246079200078384, 0.10416977099885116, 0.10755630499988911]
testbasenot  [0.10829716499938513, 0.10918466699877172, 0.10935586699997657]
testbase     [0.11739019699962228, 0.11808202800239087, 0.11899654000080773]
plain        [0.12601002500014147, 0.12718953500007046, 0.13454839599944535]
teststr      [0.13366336599938222, 0.13407608800116577, 0.13510101700012456]

6: Size=64, Loops=2048
* Integer
cached       [0.12591946799875586, 0.127094235002005, 0.13223557899982552]
plain        [0.160616523000499, 0.16232994500023779, 0.1691623620026803]
testbase     [0.2534341589998803, 0.2556092949998856, 0.2571690379991196]
testbasenot  [0.2560774869998568, 0.2574564010028553, 0.2606996459981019]
teststr      [0.268248238000524, 0.2702014210008201, 0.27107579600124154]

* String
cached       [0.09791737100022146, 0.09819723300097394, 0.10752435399990645]
testbasenot  [0.1057888709983672, 0.10588572099732119, 0.16173565400094958]
testbase     [0.10636284599968349, 0.1179599219976808, 0.12130766799964476]
plain        [0.12285572399923694, 0.12589510299949325, 0.13114397300159908]
teststr      [0.13122114399811835, 0.13273253399893292, 0.14575592999972287]

7: Size=128, Loops=1024
* Integer
cached       [0.12404713899741182, 0.12496110600113752, 0.12496385000122245]
plain        [0.15980284800025402, 0.16046370399999432, 0.16711239899814245]
testbasenot  [0.25531527800194453, 0.25563639699976193, 0.2586420219995489]
testbase     [0.25544935799916857, 0.2558138679996773, 0.257172014000389]
teststr      [0.2699256220003008, 0.2712909309993847, 0.27702098800000385]

* String
cached       [0.09376715399776003, 0.09393715400074143, 0.09975314399707713]
testbasenot  [0.10510071799944853, 0.10511873200084665, 0.10523289399861824]
testbase     [0.11240010600158712, 0.11325187799957348, 0.11632439300228725]
plain        [0.12139380200096639, 0.12202585699924384, 0.1315958569975919]
teststr      [0.12834531499902369, 0.12949470400053542, 0.12955383699954837]

8: Size=256, Loops=512
* Integer
cached       [0.12225364700134378, 0.12283446399669629, 0.1285843859986926]
plain        [0.15971405900199898, 0.16198832800000673, 0.16777605400056927]
testbase     [0.2507534860014857, 0.2527904779999517, 0.25378678199922433]
testbasenot  [0.25323686200135853, 0.2547167230004561, 0.25919888999851537]
teststr      [0.2652072370001406, 0.2658402630004275, 0.2674206650008273]

* String
cached       [0.0906629850032914, 0.0985801380011253, 0.09929232800277532]
testbase     [0.10155730300175492, 0.1042869699995208, 0.11276149599871133]
testbasenot  [0.10197166099897004, 0.11451221999959671, 0.15595895300066331]
plain        [0.11898361400017166, 0.12018223199993372, 0.12760113599870238]
teststr      [0.12645652200080804, 0.12671815700014122, 0.14095144699967932]

9: Size=512, Loops=256
* Integer
cached       [0.12672984500022721, 0.1462409830019169, 0.2653043659993273]
plain        [0.161721200998727, 0.17296033000093303, 0.19699998799842433]
testbase     [0.25432757399903494, 0.25851125400004094, 0.258548003002943]
testbasenot  [0.25619441399976495, 0.25656893900304567, 0.25998359599907417]
teststr      [0.2719232039999042, 0.2744571339972026, 0.2751794379983039]

* String
cached       [0.08841608199873008, 0.08848714099804056, 0.09124958899701596]
testbasenot  [0.09962382599769626, 0.10016373899998143, 0.10028601600060938]
testbase     [0.10713129000214394, 0.10752918499929365, 0.10952026399900205]
plain        [0.1163020489984774, 0.12190789400119684, 0.1264930679972167]
teststr      [0.1242994140011433, 0.12458201900153654, 0.12523995000083232]

10: Size=1024, Loops=128
* Integer
cached       [0.12827690600170172, 0.1294701549995807, 0.13387694999983069]
plain        [0.16636216699771467, 0.16866590399877168, 0.17549873600000865]
testbasenot  [0.25435296399882645, 0.25515673799964134, 0.2605281959986314]
testbase     [0.26351416900070035, 0.26398584699927596, 0.2651360300005763]
teststr      [0.26816077799958293, 0.26908816800278146, 0.2715630999991845]

* String
cached       [0.08827024300262565, 0.09090095799911069, 0.09729095900183893]
testbase     [0.10063145499952952, 0.1010660120009561, 0.10904535399822635]
testbasenot  [0.10313185999984853, 0.11444468399713514, 0.14796407999892836]
plain        [0.11569941500056302, 0.11579339799936861, 0.12615105800068704]
teststr      [0.12353994099976262, 0.12515813500067452, 0.13752399999793852]

These timings were performed on a rather old 32 bit single core 2GHz machine with 2GB of RAM running on a Debian derivative of Linux. I used Python 2.6.6 and Python 3.6.0. Your results may vary. ;) In any case, these results should only be used as a rough guide. timeit does a pretty good job of only timing the stuff we want to time, but it has no control over other processes that also want to use the CPU.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
1
    import time
    string = 'string'
    start_time = time.time()

    for i in range (100000):
        if isinstance(string,basestring):
            continue

    end_time = time.time()
    print (end_time - start_time)
    start_time = time.time()
    for i in range (100000):
        str(string)

    end_time = time.time()
    print (end_time - start_time)

    start_time = time.time()
    int = 9
    for i in range (1000000):
        str(int)

    end_time = time.time()
    print (end_time - start_time)
    #0.031
    #0.016
    #0.27999

In these test cases it was twice as fast to just perform str(string) as opposed to using the conditional statement.

alexjones
  • 86
  • 1
  • 9