Can I encode a string, save it to a file, read it back, and decode it using Python 3?

Question

Is it possible to take a string with accented characters, store it in a local file, read it from that file, and restore it to its original form?

I have been trying to encode the string using utf-8. The write() method only takes str arguments. The decode() method only takes bytes arguments. I can't write to the file unless I encode the data, but I can't restore it.

Here is the code I am trying to run:

unicode = "utf-8"
name = "Dončić"
with open("doncic", 'w') as file:
    file.write(str(name.encode(unicode)))

with open("doncic", 'r', encoding='utf8') as file:
print(file.read())

I've been searching for an answer for hours, and none of the solutions I've found have included any file i/o.

This is my first post! Thank you for your help!

If you're using Python 3 (which you should be), you shouldn't need to use `encode` or `decode` at all; just write the string to the file, and read it back. Calling `str` on the string before writing it is also unnecessary; that just creates a string representation of the given data, which is unnecessary when the data is already a string. If that doesn't work, please include more details, including the expected output and the output you're actually getting. — jirassimok, Nov 14 '19 at 06:40
I am using Python 3. If I try to write the string "Dončić" to the file, it gives me this error: UnicodeEncodeError: 'charmap' codec can't encode character '\u010d' in position 3: character maps to Edit: After searching further for that error, I found a question that was helpful. https://stackoverflow.com/a/42495690/12370750 — Jake Bushlack, Nov 14 '19 at 06:52
Possible duplicate of [UnicodeEncodeError: 'charmap' codec can't encode characters](https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters) — jirassimok, Nov 14 '19 at 17:03

score 1 · Answer 1 · edited Nov 14 '19 at 09:16

Python can open files in two modes, text or binary. Text mode handles the encoding for you and you can directly read and write strings, including everything non-ascii.

Text mode, encoding handled by python:

with open('text.txt', 'w', encoding='utf-8') as f:
    f.write('Hellø Wőrld')

# read back
with open('text.txt', encoding='utf-8') as f:
    print(f.read())

Binary mode, encoding handled by you:

with open('text.txt', 'wb') as f:
    f.write('Hellø Wőrld'.encode('utf-8'))

# read back
with open('text.txt', 'rb') as f:
    print(f.read().decode('utf-8'))

fyaa · Answer 2 · 2019-11-14T11:26:38.197

-1

One option is to store and read it binary.

unicode = "utf-8"
name = "Dončić"
with open("doncic", 'wb') as file:
    file.write(name.encode(unicode))

with open("doncic", 'rb') as file:
    print(file.read().decode(unicode))

The second (and potentially simpler) option is to use the encoding parameter of open.

unicode = "utf-8"
name = "Dončić"
with open('text.txt', 'w', encoding=unicode) as file:
    file.write(name)

with open('text.txt', 'r', encoding=unicode) as file:
    print(file.read())

A third option is to leave out the encoding if your default or preferred encoding is already utf-8.

import locale

unicode = "utf-8"
name = "Dončić"

assert locale.getpreferredencoding().upper() == 'UTF-8'

with open('text.txt', 'w') as file:
    file.write(name)

with open('text.txt', 'r') as file:
    print(file.read())

Obviously, the assertion may or may not be a good option to verify this in your program. But this points out that it can work out of the box.

I ran all snippets using Python 3.7.

The output in all cases is:

Dončić

edited Nov 14 '19 at 11:26

answered Nov 14 '19 at 07:02

fyaa

646
1
7
25

Why so complicated? Just open the file in text mode. – MaxNoe Nov 14 '19 at 07:22
This is also part of your solution, so why the downvote? – fyaa Nov 14 '19 at 08:09
Because the first one is the simpler one that should be used if you just want to store strings in a textfile – MaxNoe Nov 14 '19 at 08:12
And now you added python 2 code one Month before its end of life and when the OP specifically said he is using python 3. – MaxNoe Nov 14 '19 at 10:29
I ran everything with Python 3.7, so it is no Python 2 code. – fyaa Nov 14 '19 at 11:15
The u does nothing in python3 and was only added for compatibility with python2 – MaxNoe Nov 14 '19 at 11:17
Thus, it it still valid Python 3 code. However, I changed the third part accordingly. – fyaa Nov 14 '19 at 11:28
Yes, but it does not "mark the string as unicode". All string literals in python are unicode. Byte literals start with `b"` – MaxNoe Nov 14 '19 at 11:30

SRG · Answer 3 · 2019-11-14T07:21:31.623

-1

encode a string the open a file into write binary format ,write a string

open a file in read binary format then decode a string

else open a file in read format('r' instead of 'rb),it will decode the string for you

str_original = 'Hello'


with open(filepath, 'wb') as f:        
    f.write(str_original.encode(encoding='utf-8'))       


f = open(filepath, "rb")
print(f.read().decode())
f.close()

edited Nov 14 '19 at 07:21

answered Nov 14 '19 at 07:12

SRG

345
1
9

This is a duplicate of https://stackoverflow.com/a/58851123/1480374 and ignores the non-ascii characters of the string for quick reproducibility. – fyaa Nov 14 '19 at 08:14
I have used my my own logic here.and logic may get replicatde.So why to down vote – SRG Nov 14 '19 at 09:16
@fyaa this solution doesn't ignore non-ASCII characters, since the default value for the `errors` parameter is `"strict"`, not `"ignore"`. – lenz Nov 14 '19 at 12:51
@lenz I just meant that the example should incorporate non-ASCII characters, so that you can just copy and paste to try it out. I did indeed not mean that it crashes when you change `str_original`. – fyaa Nov 17 '19 at 16:51

Can I encode a string, save it to a file, read it back, and decode it using Python 3?

3 Answers3

Linked