What is the difference between utf-8 and utf-8-sig?

Question

I am trying to encode Bangla words in python using pandas dataframe. But as encoding type, utf-8 is not working but utf-8-sig is. I know utf-8-sig is with BOM(Byte order mark). But why is this called utf-8-sig and how it works?

I believe this is a duplicate: https://stackoverflow.com/questions/2223882/whats-the-difference-between-utf-8-and-utf-8-without-bom — juanpa.arrivillaga, Jul 22 '19 at 20:08
Possible duplicate of [python utf-8-sig BOM in the middle of the file when appending to the end](https://stackoverflow.com/questions/23154355/python-utf-8-sig-bom-in-the-middle-of-the-file-when-appending-to-the-end) — Rory Daulton, Jul 22 '19 at 20:10
I don't find anything related to the 'sig' part. That's why I asked this question. — JuBaer AD, Jul 25 '19 at 06:32

score 65 · Answer 1 · edited Jul 18 '23 at 21:10

"sig" in "utf-8-sig" is the abbreviation of "signature" (i.e. signature utf-8 file). Using utf-8-sig to read a file will treat the BOM as metadata that explains how to interpret the file, instead of as part of the file contents. Read more on Python's codecs documentation page"

To increase the reliability with which a UTF-8 encoding can be detected, Microsoft invented a variant of UTF-8 (that Python calls "utf-8-sig") for its Notepad program: Before any of the Unicode characters is written to the file, a UTF-8 encoded BOM (which looks like this as a byte sequence: 0xef, 0xbb, 0xbf) is written.

What is the difference between utf-8 and utf-8-sig?

1 Answers1

Linked