How can I read a single character at a time from a file in Python?

Question

In Python, given the name of a file, how can I write a loop that reads one character each time through the loop?

score 109 · Accepted Answer · edited Jan 12 '23 at 06:27

109

with open(filename) as f:
    while True:
        c = f.read(1)
        if not c:
            print("End of file")
            break
        print("Read a character:", c)

edited Jan 12 '23 at 06:27

Karl Knechtel

62,466
11
102
153

answered Jun 07 '10 at 09:19

jchl

6,332
4
27
51

43

Since this is reading a byte at a time, won't it fail for non-ASCII encodings? – David Chouinard Jan 15 '13 at 18:30
3

Question and answers are confusing character and byte concepts. If the file is in a single byte per character encoding such as Ascii and many others, then yes you are reading a single char by reading a single byte sized chunk, otherwise if the encoding requires more than a single byte per character, then you are just reading a single byte not a single character. – Basel Shishani Oct 16 '13 at 01:34
2

That's right. Therefore, I often do `result = open(filename).read()` and then read `result` character by character. – Shravan Jun 30 '15 at 17:27
5

To David Chouinard's question: This snippet correctly works in Python 3 with file in UTF-8 encoding. If you have, for example, file in Windows-1250 encoding, just change first line to `with open(filename, encoding='Windows-1250') as f:` – SergO Jan 21 '16 at 09:17
1

And to add to SergO, `open(filename, "r")` vs `open(filename, "rb")` can result in different numbers of iterations (at least talking about Python 3). "r" mode could read multiple bytes to get `c` if it hits the appropriate special character. – dcc310 Dec 08 '17 at 23:06
I did this with a UTF-8 file that had multi-byte characters and `f.read(1)` correctly pulls the whole character, not just one byte. Got my upvote. – rkechols Oct 28 '21 at 18:14
@BaselShishani in 3.x, `.read()` on a file opened in text mode will read a specified number of characters, and strings are always Unicode, so there is no problem. The question was tagged in a version-agnostic way, and 3.x releases existed when the question was asked. – Karl Knechtel Jan 12 '23 at 06:28

score 45 · Answer 2 · edited Jan 25 '21 at 22:41

45

First, open a file:

with open("filename") as fileobj:
    for line in fileobj:  
       for ch in line: 
           print(ch)

This goes through every line in the file and then every character in that line.

edited Jan 25 '21 at 22:41

Pro Q

4,391
4
43
92

answered Dec 25 '13 at 17:52

Raj

651
8
12

Agreed, this seems like the more Pythonic way to go about it. Wouldn't this take care of handling non-ASCII encoding as well? – Ron7 Nov 28 '17 at 09:19
21

One reason you might read a file one character at a time is the file is too big to fit in memory. But the answer above assume each line can fit in memory. – C S Feb 11 '18 at 22:44
Edited it to match Python 3. – Apr 08 '20 at 00:24
1

Since the OP never mentioned reading the whole file one char at a time, this approach is non-optimal because the whole file could be contained on a single line; in which case considerable time is taken to read that whole line in before char processing is done. Best to use f.read(1) on partial reads in these cases. – owl7 Jul 07 '21 at 12:02
-1. Seconding @C S's comment. The OP asked how to read "a single character at a time", so this doesn't answer the question. This is not simpler than the accepted answer, and it's best to have a function that won't sometimes unnecessarily crash your script/application. What if it's an SQL INSERT for a full table? Or uses a non-native newline character? Best case is inefficient buffering; worst case is running out of memory. – Douglas Myers-Turnbull Jul 13 '21 at 20:46

score 16 · Answer 3 · answered Oct 06 '14 at 02:20

I like the accepted answer: it is straightforward and will get the job done. I would also like to offer an alternative implementation:

def chunks(filename, buffer_size=4096):
    """Reads `filename` in chunks of `buffer_size` bytes and yields each chunk
    until no more characters can be read; the last chunk will most likely have
    less than `buffer_size` bytes.

    :param str filename: Path to the file
    :param int buffer_size: Buffer size, in bytes (default is 4096)
    :return: Yields chunks of `buffer_size` size until exhausting the file
    :rtype: str

    """
    with open(filename, "rb") as fp:
        chunk = fp.read(buffer_size)
        while chunk:
            yield chunk
            chunk = fp.read(buffer_size)

def chars(filename, buffersize=4096):
    """Yields the contents of file `filename` character-by-character. Warning:
    will only work for encodings where one character is encoded as one byte.

    :param str filename: Path to the file
    :param int buffer_size: Buffer size for the underlying chunks,
    in bytes (default is 4096)
    :return: Yields the contents of `filename` character-by-character.
    :rtype: char

    """
    for chunk in chunks(filename, buffersize):
        for char in chunk:
            yield char

def main(buffersize, filenames):
    """Reads several files character by character and redirects their contents
    to `/dev/null`.

    """
    for filename in filenames:
        with open("/dev/null", "wb") as fp:
            for char in chars(filename, buffersize):
                fp.write(char)

if __name__ == "__main__":
    # Try reading several files varying the buffer size
    import sys
    buffersize = int(sys.argv[1])
    filenames  = sys.argv[2:]
    sys.exit(main(buffersize, filenames))

The code I suggest is essentially the same idea as your accepted answer: read a given number of bytes from the file. The difference is that it first reads a good chunk of data (4006 is a good default for X86, but you may want to try 1024, or 8192; any multiple of your page size), and then it yields the characters in that chunk one by one.

The code I present may be faster for larger files. Take, for example, the entire text of War and Peace, by Tolstoy. These are my timing results (Mac Book Pro using OS X 10.7.4; so.py is the name I gave to the code I pasted):

$ time python so.py 1 2600.txt.utf-8
python so.py 1 2600.txt.utf-8  3.79s user 0.01s system 99% cpu 3.808 total
$ time python so.py 4096 2600.txt.utf-8
python so.py 4096 2600.txt.utf-8  1.31s user 0.01s system 99% cpu 1.318 total

Now: do not take the buffer size at 4096 as a universal truth; look at the results I get for different sizes (buffer size (bytes) vs wall time (sec)):

As you can see, you can start seeing gains earlier on (and my timings are likely very inaccurate); the buffer size is a trade-off between performance and memory. The default of 4096 is just a reasonable choice but, as always, measure first.

score 8 · Answer 4 · edited Aug 26 '16 at 09:27

8

Python itself can help you with this, in interactive mode:

>>> help(file.read)
Help on method_descriptor:

read(...)
    read([size]) -> read at most size bytes, returned as a string.

    If the size argument is negative or omitted, read until EOF is reached.
    Notice that when in non-blocking mode, less data than what was requested
    may be returned, even if no size parameter was given.

edited Aug 26 '16 at 09:27

zondo

19,901
8
44
83

answered Jun 07 '10 at 09:19

Mattias Nilsson

3,639
1
22
29

6

I agree with the sentiment, but perhaps this is better suited as a comment to the OP? – Mike Boers Jun 07 '10 at 12:56
2

Might be, but I think all that text would look messy in a comment. – Mattias Nilsson Jun 07 '10 at 13:23

score 8 · Answer 5 · edited Oct 03 '21 at 10:33

8

Just:

myfile = open(filename)
onecharacter = myfile.read(1)

edited Oct 03 '21 at 10:33

Pikamander2

7,332
3
48
69

answered Jun 07 '10 at 09:21

joaquin

82,968
29
138
152

score 4 · Answer 6 · answered Feb 02 '14 at 04:28

I learned a new idiom for this today while watching Raymond Hettinger's Transforming Code into Beautiful, Idiomatic Python:

import functools

with open(filename) as f:
    f_read_ch = functools.partial(f.read, 1)
    for ch in iter(f_read_ch, ''):
        print 'Read a character:', repr(ch)

score 2 · Answer 7 · answered Jun 07 '10 at 09:19

2

Just read a single character

f.read(1)

answered Jun 07 '10 at 09:19

David Sykes

48,469
17
71
80

Pro Q · Answer 8 · 2021-01-25T22:44:11.113

2

This will also work:

with open("filename") as fileObj:
    for line in fileObj:  
        for ch in line:
            print(ch)

It goes through every line in the the file and every character in every line.

(Note that this post now looks extremely similar to a highly upvoted answer, but this was not the case at the time of writing.)

edited Jan 25 '21 at 22:44

answered Dec 01 '15 at 19:29

Pro Q

4,391
4
43
92

1

-1. This is a bad general approach because it loads potentially massive lines into memory,. Plus, it's not simpler than the accepted answer. What if it's a a 100-billion-length nucleotide sequence (ATGC)? Or an SQL INSERT for a full table? Or uses a non-native newline character? Best case is inefficient buffering; worst case is running out of memory. – Douglas Myers-Turnbull Jul 13 '21 at 20:43
Very true; this is not efficient. But for beginners to Python this is often an easy for-loop method that immediately makes sense. – Pro Q Jul 13 '21 at 20:53

Douglas Myers-Turnbull · Answer 9 · 2021-07-14T04:49:21.680

Best answer for Python 3.8+:

with open(path, encoding="utf-8") as f:
    while c := f.read(1):
        do_my_thing(c)

You may want to specify utf-8 and avoid the platform encoding. I've chosen to do that here.

Function – Python 3.8+:

def stream_file_chars(path: str):
    with open(path) as f:
        while c := f.read(1):
            yield c

Function – Python<=3.7:

def stream_file_chars(path: str):
    with open(path, encoding="utf-8") as f:
        while True:
            c = f.read(1)
            if c == "":
                break
            yield c

Function – pathlib + documentation:

from pathlib import Path
from typing import Union, Generator

def stream_file_chars(path: Union[str, Path]) -> Generator[str, None, None]:
    """Streams characters from a file."""
    with Path(path).open(encoding="utf-8") as f:
        while (c := f.read(1)) != "":
            yield c

score 0 · Answer 10 · edited Feb 25 '15 at 22:38

0

f = open('hi.txt', 'w')
f.write('0123456789abcdef')
f.close()
f = open('hej.txt', 'r')
f.seek(12)
print f.read(1) # This will read just "c"

edited Feb 25 '15 at 22:38

davidkonrad

83,997
17
205
265

answered Feb 25 '15 at 22:28

user1489833

1
1

3

Welcome to Stackoverflow! You should elaborate - why is this an answer? – davidkonrad Feb 25 '15 at 22:37

score 0 · Answer 11 · edited Aug 26 '16 at 09:27

0

You should try f.read(1), which is definitely correct and the right thing to do.

edited Aug 26 '16 at 09:27

zondo

19,901
8
44
83

answered Jun 07 '10 at 09:19

Johan Kotlinski

25,185
9
78
101

score 0 · Answer 12 · answered Mar 11 '17 at 08:30

To make a supplement, if you are reading file that contains a line that is vvvvery huge, which might break your memory, you might consider read them into a buffer then yield the each char

def read_char(inputfile, buffersize=10240):
    with open(inputfile, 'r') as f:
        while True:
            buf = f.read(buffersize)
            if not buf:
                break
            for char in buf:
                yield char
        yield '' #handle the scene that the file is empty

if __name__ == "__main__":
    for word in read_char('./very_large_file.txt'):
        process(char)

score 0 · Answer 13 · answered Dec 13 '20 at 18:50

0

os.system("stty -icanon -echo")
while True:
    raw_c = sys.stdin.buffer.peek()
    c = sys.stdin.read(1)
    print(f"Char: {c}")

answered Dec 13 '20 at 18:50

David Hamner

671
5
6

score 0 · Answer 14 · answered Sep 21 '22 at 06:34

Combining qualities of some other answers, here is something that is invulnerable to long files / lines, while being more succinct and faster:

import functools as ft, itertools as it

with open(path) as f:
    for c in it.chain.from_iterable(
        iter(ft.partial(f.read, 4096), '')
    ):
        print(c)

score -2 · Answer 15 · answered Sep 10 '18 at 08:33

-2

#reading out the file at once in a list and then printing one-by-one
f=open('file.txt')
for i in list(f.read()):
    print(i)

answered Sep 10 '18 at 08:33

ParagAb

5
3

While this might answer the authors question, it lacks some explaining words and links to documentation. Raw code snippets are not very helpful without some phrases around it. You may also find [how to write a good answer](https://stackoverflow.com/help/how-to-answer) very helpful. Please edit your answer. – hellow Sep 10 '18 at 09:10
You don't need the cast to list. – user240515 Apr 27 '19 at 22:31
-1. Casting to list unnecessarily loads the whole thing into memory, which can cause OOM and/or inefficient buffering. The OP asked how to read "a single character at a time", so this doesn't answer the question. – Douglas Myers-Turnbull Jul 13 '21 at 20:48

How can I read a single character at a time from a file in Python?

15 Answers15

Linked

Related