2

Is it possible to take a string with accented characters, store it in a local file, read it from that file, and restore it to its original form?

I have been trying to encode the string using utf-8. The write() method only takes str arguments. The decode() method only takes bytes arguments. I can't write to the file unless I encode the data, but I can't restore it.

Here is the code I am trying to run:

unicode = "utf-8"
name = "Dončić"
with open("doncic", 'w') as file:
    file.write(str(name.encode(unicode)))

with open("doncic", 'r', encoding='utf8') as file:
print(file.read())

I've been searching for an answer for hours, and none of the solutions I've found have included any file i/o.

This is my first post! Thank you for your help!

  • 2
    If you're using Python 3 (which you should be), you shouldn't need to use `encode` or `decode` at all; just write the string to the file, and read it back. Calling `str` on the string before writing it is also unnecessary; that just creates a string representation of the given data, which is unnecessary when the data is already a string. If that doesn't work, please include more details, including the expected output and the output you're actually getting. – jirassimok Nov 14 '19 at 06:40
  • 1
    I am using Python 3. If I try to write the string "Dončić" to the file, it gives me this error: UnicodeEncodeError: 'charmap' codec can't encode character '\u010d' in position 3: character maps to Edit: After searching further for that error, I found a question that was helpful. https://stackoverflow.com/a/42495690/12370750 – Jake Bushlack Nov 14 '19 at 06:52
  • Possible duplicate of [UnicodeEncodeError: 'charmap' codec can't encode characters](https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters) – jirassimok Nov 14 '19 at 17:03

3 Answers3

1

Python can open files in two modes, text or binary. Text mode handles the encoding for you and you can directly read and write strings, including everything non-ascii.

Text mode, encoding handled by python:

with open('text.txt', 'w', encoding='utf-8') as f:
    f.write('Hellø Wőrld')

# read back
with open('text.txt', encoding='utf-8') as f:
    print(f.read())

Binary mode, encoding handled by you:

with open('text.txt', 'wb') as f:
    f.write('Hellø Wőrld'.encode('utf-8'))

# read back
with open('text.txt', 'rb') as f:
    print(f.read().decode('utf-8'))
lenz
  • 5,658
  • 5
  • 24
  • 44
MaxNoe
  • 14,470
  • 3
  • 41
  • 46
-1

One option is to store and read it binary.

unicode = "utf-8"
name = "Dončić"
with open("doncic", 'wb') as file:
    file.write(name.encode(unicode))

with open("doncic", 'rb') as file:
    print(file.read().decode(unicode))

The second (and potentially simpler) option is to use the encoding parameter of open.

unicode = "utf-8"
name = "Dončić"
with open('text.txt', 'w', encoding=unicode) as file:
    file.write(name)

with open('text.txt', 'r', encoding=unicode) as file:
    print(file.read())

A third option is to leave out the encoding if your default or preferred encoding is already utf-8.

import locale

unicode = "utf-8"
name = "Dončić"

assert locale.getpreferredencoding().upper() == 'UTF-8'

with open('text.txt', 'w') as file:
    file.write(name)

with open('text.txt', 'r') as file:
    print(file.read())

Obviously, the assertion may or may not be a good option to verify this in your program. But this points out that it can work out of the box.

I ran all snippets using Python 3.7.

The output in all cases is:

Dončić

fyaa
  • 646
  • 1
  • 7
  • 25
-1

encode a string the open a file into write binary format ,write a string

open a file in read binary format then decode a string

else open a file in read format('r' instead of 'rb),it will decode the string for you

str_original = 'Hello'


with open(filepath, 'wb') as f:        
    f.write(str_original.encode(encoding='utf-8'))       


f = open(filepath, "rb")
print(f.read().decode())
f.close()
SRG
  • 345
  • 1
  • 9
  • This is a duplicate of https://stackoverflow.com/a/58851123/1480374 and ignores the non-ascii characters of the string for quick reproducibility. – fyaa Nov 14 '19 at 08:14
  • I have used my my own logic here.and logic may get replicatde.So why to down vote – SRG Nov 14 '19 at 09:16
  • @fyaa this solution doesn't ignore non-ASCII characters, since the default value for the `errors` parameter is `"strict"`, not `"ignore"`. – lenz Nov 14 '19 at 12:51
  • @lenz I just meant that the example should incorporate non-ASCII characters, so that you can just copy and paste to try it out. I did indeed not mean that it crashes when you change `str_original`. – fyaa Nov 17 '19 at 16:51