-1

How can I read only first symbol in each line with out reading all line, using python? For example, if I have file like:

apple  
pear  
watermelon 

In each iteration I must store only one (the first) letter of line. Result of program should be ["a","p","w"], I tried to use file.seek(), but how can I move it to the new line?

martineau
  • 119,623
  • 25
  • 170
  • 301
deethereal
  • 53
  • 1
  • 12

3 Answers3

2

ti7 answer is great, but if the lines might be too long to save in memory, you might wish to read char-by-char to prevent storing the whole line in memory:

from pathlib import Path
from typing import Iterator

NEWLINE_CHAR = {'\n', '\r'}


def first_chars(file_path: Path) -> Iterator[str]:
    with open(file_path) as fh:
        new_line = True
        while c := fh.read(1):
            if c in NEWLINE_CHAR:
                new_line = True
            elif new_line:
                yield c
                new_line = False

Test:

path = Path('/some/path/a.py')
easy_first_chars = [l[0] for l in path.read_text().splitlines() if l]
smart_first_chars = list(first_chars(path))
assert smart_first_chars == easy_first_chars
Yam Mesicka
  • 6,243
  • 7
  • 45
  • 64
  • Ah, unfortunately, I suspect this will be incredibly slow relative to what it could be (it creates a new Python string for every character in the file!) .. which will be noticeable if the file has so much content on a single line that it cannot practically be read into memory! I suspect for such a case, loading very big blocks, splitting 'em, and working around `\n` on the boundary might be ideal.. there's some splitting analysis [here](https://stackoverflow.com/a/42373311/4541045) . Additionally, the default open args translates all newlines to `\n` regardless of what they were initially! – ti7 Apr 16 '21 at 07:03
0

You can read one letter with file.read(1)

file = open(filepath, "r")

letters = []
# Initilalized to '\n' to sotre first letter
previous = '\n'

while True:
    # Read only one letter
    letter = file.read(1)
    if letter == '':
        break
    elif previous == '\n':
        # Store next letter after a next line '\n'
        letters.append(letter)

    previous = letter

Joan Puigcerver
  • 104
  • 1
  • 13
0

file-like objects are iterable, so you can directly use them like this

collection = []

with open("input.txt") as fh:
    for line in fh:  # iterate by-lines over file-like
        try:
            collection.append(line[0])  # get the first char in the line
        except IndexError:  # line has no chars
            pass  # consider other handling

# work with collection

You may also consider enumerate() if you cared about which line a particular value was on, or yielding line[0] to form a generator (which may allow a more efficient process if it can halt before reading the entire file)

def my_generator():
    with open("input.txt") as fh:
        for lineno, line in enumerate(fh, 1):  # lines are commonly 1-indexed
            try:
                yield lineno, line[0]  # first char in the line
            except IndexError:  # line has no chars
                pass  # consider other handling

for lineno, first_letter in my_generator():
    # work with lineno and first_letter here and break when done
ti7
  • 16,375
  • 6
  • 40
  • 68
  • so there's no way to read only one letter I need to read whole line anyway? – deethereal Apr 15 '21 at 21:17
  • 1
    _sort of_ .. if you know exactly how many characters are in each line, you can `.seek()` into it the appropriate amount each time (this increments the file pointer), but this is not a common case and may not really be more efficient and depend on file encoding (the file is loaded into memory in blocks of the block size; often 4096 bits and not all chars are 8 bits wide, etc.) .. You could also hunt for the `\n` chars yourself, but practically, iterating will do this for you. The file will not be entirely stored in memory in this case. Using `fh.read()` will bring the entire file into memory. – ti7 Apr 15 '21 at 21:20
  • ty, for your reply – deethereal Apr 15 '21 at 21:23