0

Looking at solutions to reading in a file in Python, every time the newline character should be stripped off:

In [5]: [line for line in open("text.txt", "r")]
Out[5]: ['line1\n', 'line2']

The intuitive behavior (judging by the popularity of some questions (1, 2) about this) would be to just yield the stripped lines.

What is the rationale behind this?

TylerH
  • 20,799
  • 66
  • 75
  • 101
xtofl
  • 40,723
  • 12
  • 105
  • 192
  • 3
    "Lots of people don't expect to get newlines" doesn't necessarily mean that's the intuitive design. Maybe newline expecters outnumber newline unexpecters by a hundred to one - you just don't know it because none of them make posts on SO saying "I used `for line in file` and it gave me exactly what I thought it would" – Kevin Jan 22 '16 at 14:09
  • 2
    I would read the [Zen of Python](https://www.python.org/dev/peps/pep-0020/) - where it states "Explicit is better than implicit.". Implicitly stripping off new lines may not work for some cases, e.g. writing the lines out to another file. – AChampion Jan 22 '16 at 14:10
  • 1
    I believe some large, important body decided that a line is a sequence of characters ending with a pre-defined character or set of characters (a newline). This means "line1" is not actually a line, and also why some people consider files without a trailing newline invalid. – Sam McCreery Jan 22 '16 at 14:18
  • 1
    https://docs.python.org/2/library/stdtypes.html#str.splitlines Check this out, captain obvious: "Return a list of the lines in the string, [...] Line breaks are not included in the resulting list". I think this is a valid question about design, and if the answer is unknown, it shouldn't be, "because obviously a line has a '\n' char at the end·. – dyeray Jan 22 '16 at 14:24

2 Answers2

1

Well, this is a line. A line is defined by ending with the character \n. If a sequence of characters did not end with a \n (or EOF) how could we know it was a line?

"hello world"
"hello world\n"

The first is not a line, if we print it twice we might get

hello worldhello world

Wile the second version will give us

hello world
hello world
beoliver
  • 5,579
  • 5
  • 36
  • 72
  • 1
    I would expect if you iterate over an iterable object, the element that separates every item wouldn't be included at the end of each item. For example, on the csv module, you can also separate the elements iterating over them, and the commas and newlines don't appear in the result. – dyeray Jan 22 '16 at 14:20
0

Migrating the asker's response/solution from the question to an answer:

Granted: 'intuitive' is subjective. 'Consistent', however, is less so. Apparently the 'line' concept in "line1\nline2".splitlines() is a different one than the one handled by the iter(open("text.txt")):

>>> assert(open("text.txt").readlines() == \
... open("text.txt").read().splitlines())
AssertionError

Pretty sure people do get caught by this.

So I was mistaken: maybe my intuition is just in line with the splitlines interpretation: the split stuff should not include the separators. Maybe the answer to my question is not technical, but more like "since PEP-xyz was approved by different people than PEP-qrs". Maybe I should post it to some python language forum.

TylerH
  • 20,799
  • 66
  • 75
  • 101