4

Reading some Python code, I discovered this syntax if a[i:] < b[j:] and the colon threw me for a loop. I found this great question/answer about it:

Colon (:) in Python list index

But then I looked back at my code example, and it's still unclear how it's using what I understand to be a shortcut for splice in a comparison.

I'm attempting to reverse engineer this into a JavaScript equivalent function. That weird comparison is the only thing I can't comprehend. What exactly is python comparing? String length? or something else?

def combineStrings(a, b):
    answer = ''
    a += '~'
    b += '~'
    i = 0
    j = 0
    while a[i] != '~' or b[j] != '~':
        print (i, a[i:], b[j:], a[i:] < b[j:])
        if a[i] != '~' and a[i:] < b[j:]:
            answer += a[i]
            i += 1
        else:
            answer += b[j]
            j += 1
    print (answer)

combineStrings('TACO', 'CAT')

Output

0 TACO~ CAT~ False
0 TACO~ AT~ False
0 TACO~ T~ True
1 ACO~ T~ True
2 CO~ T~ True
3 O~ T~ True
4 ~ T~ False
CATACOT
AnonymousSB
  • 3,516
  • 10
  • 28
  • what you are really asking here? – ddor254 Nov 19 '18 at 09:57
  • @ddor254 I'm asking how a slice notation can be used to compare strings? does python automatically compare length? or is it comparing something else? – AnonymousSB Nov 19 '18 at 09:58
  • Don't let the last colon confuse you, it belongs to the `if` statement. `a[i:]` and `b[j:]` is just the standard notation for `a` from the i-th index on until the end. – Christian König Nov 19 '18 at 09:59
  • @AnonymousSB - in this case, it is comparing substrings. So `"TACO~" < "CAT~"` yields `False`. – Christian König Nov 19 '18 at 09:59
  • But "TACO~" < "CAT~" equal `False` in one check, but then later "TACO~" < "T~" is `True`. So clearly it can't be string length it's comparing. – AnonymousSB Nov 19 '18 at 10:01
  • 3
    So your problem is not about string slicing, but string comparison. It is a "simple" lexicographical comparison... `T` == `T`, but `a`<`~`... – Christian König Nov 19 '18 at 10:02
  • 2
    It is [lexicographically comparing](https://en.wikipedia.org/wiki/Lexicographical_order) substrings. `TACO~` would come after `CAT~` in a dictionary, but before `T~`. – Amadan Nov 19 '18 at 10:03
  • btw, using `+=` for strings is generally not recommended for long strings, you can build a list of characters and join them at the end – iamanigeeit Nov 23 '18 at 09:10
  • Running a join on an array, especially if it's long, seems like a performance hit, having to iterate over each item. I'm not familiar with the inner workings of Python, but here's a [performance test](https://jsperf.com/concat-string-vs-join-2/1) in JavaScript. 822 million vs. 47 million operations per second. – AnonymousSB Nov 23 '18 at 12:38
  • Because strings are immutable, `a += b` requires creating a new string and reassigning `a`. See https://stackoverflow.com/questions/39675898/is-python-string-concatenation-bad-practice The difference isn't that big because Python is smart enough not to keep re-traversing the strings (hence linear instead of O(n^2). Just tested joining array vs `+=` and joining is about 80% faster for long strings. – iamanigeeit Sep 28 '20 at 15:47

1 Answers1

3

It's comparing by Lexicographical Order

If you're trying to find the character in b (T) that is as least as big as a (T) and insert all the consecutive letters in a (A, C, O) that are smaller that character in b, this code makes sense.

~ is the biggest printable ASCII character (126), hence it's used as a comparison.

0 TACO~ AT~ False # because 'T' < 'A'
0 TACO~ T~ True   # because '~' > 'A'
AnonymousSB
  • 3,516
  • 10
  • 28
iamanigeeit
  • 784
  • 1
  • 6
  • 11