1

I was trying to read utf-16 file. This code is working fine on local machine but when I ran it on AWS Batch service then it's giving above error.

import codecs
file_to_split = codecs.open("file_utf_16.txt", 'r+', "UTF-16")

It throws an exception header 'ascii' codec can't encode character '\ufeff' in position 0: ordinal not in range(128)

I am using Python 3.6.

Muhammad Imran Tariq
  • 22,654
  • 47
  • 125
  • 190

1 Answers1

0

The first few bytes are the BOM (Byte Order Mark). This is a sequence of bytes at the beginning of a Unicode document that tells you the byte order, encoding, etc.

This Wikipedia article will help explain the technical details:

Byte order mark

You have a couple of choices. This answer shows various methods of handling BOM for different encodings: Stack Overflow answer

And this answer shows you how to properly read the file into memory, which is what I think that you want to do. Stack Overflow answer. This way you can then process the file as a normal Python string.

John Hanley
  • 74,467
  • 6
  • 95
  • 159