0

Please explain:

>>> [line.rstrip() for line in open('foo')]
[',' 'hello text file6', '', '', '', 'goodby text file7', '', 'bye', '']

>>> with open('foo') as f: [line.rstrip() for line in f if line.rstrip()[-1:].isdigit()]
... 
['hello text file6', 'goodby text file7']

[-1:] ignores empty strings while list comprehension above has them.So far I'm accustomed that slices work only within one single string. [-1:] slice seems crossed the boundaries of many strings.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
  • You printed lines where `line` is non-empty and `line[-1].isdigit()`. What is unexpected here? – Norrius Apr 02 '18 at 13:30
  • line[-1] gives the error 'IndexError: string index out of range' for empty strings – Vladimir Zolotykh Apr 02 '18 at 13:33
  • That's because [slices are not indices](https://stackoverflow.com/questions/9490058/why-substring-slicing-index-out-of-range-works-in-python) and work a bit different. In particular, an out-of-bounds slice is an empty sequence. Was that your question? – Norrius Apr 02 '18 at 13:35
  • Still I don't quite comprehend. We apply slice to the single string. Somewhere must have been some implicit concatenation of strings to which the slice is applies instead of the original empty '' string. It's complicated or I may have missed something important. – Vladimir Zolotykh Apr 02 '18 at 13:40
  • slices here is just a workaround for not getting 'out of range' error – Vladimir Zolotykh Apr 02 '18 at 14:05

2 Answers2

1

Breaking things down:

  • Slices ([::] syntax) will address all possible values in the range. That means all of these are true.

    • ['cat'][0:1] == ['cat']
    • ['cat'][1:2] == []
    • ['cat'][-1:2] == ['cat']
    • ['cat'][-10000:] == ['cat']
  • Accessing something by index on the other hand ([x] will fail if the index does not exist. That means these fail:

    • ['cat'][1]
    • ['cat'][-2]
  • Your comprehension means "take all lines that meet the following:

    • If you rstrip them they are not empty (remember an empty string is NOT a digit)
    • The substring representing the last character is a digit

You might want this instead:

 [line.rstrip() for line in f if not line.rstrip() or line.rstrip()[-1].isdigit()]

That will include your blank lines.


As a clarification, you are at no point getting values from outside of the string ''[-123:] will pass. ''[-123] (no colon) will fail.

cwallenpoole
  • 79,954
  • 26
  • 128
  • 166
1

There is no implicit concatenation, slices don't do that. Consider this example:

lines = ['', 'abc', 'xyz123']

for line in lines:
    print repr(line.rstrip()[-1:]),
    print line.rstrip()[-1:].isdigit()

Output:

'' False
'c' False
'3' True

The unexpected part might be how it handles the empty string. Any slice of the empty string will be an empty string because an out-of-bounds slice is an empty sequence. Then, str.isdigit is defined to return False on empty strings, so these are filtered out from your list.

Norrius
  • 7,558
  • 5
  • 40
  • 49
  • Slices is less strict than indexes (returns empty strings instead of raising a error) and empty strings are filtered by my comprehension. Your "Any slice of the empty string will be an empty string because an out-of-bounds slice is an empty sequence" is much more to the point. – Vladimir Zolotykh Apr 02 '18 at 14:11