1

I'm using Microsoft SQL Server Management Studio to create an XML file. This file needs at the top to be uploaded properly. I understand that this is fairly normal and I need to figure out how to add that line myself.

To add the line, I'm calling each of my files and modifying them with the following function:

def append_prologue(file, orgID, schema):
    timestamp = datetime.today().strftime('%Y%m%d')
    new_name = f'{orgID}_000_2022TSDS_{timestamp}1500_' + schema
    new_file = file.parent.parent / 'results/with_prologue' / new_name
    if new_file.exists():
        print(f'{new_file.name} already exists')
    with open(file, 'r') as original:
        data = original.read()
        data = data[3:] #how the original writer dealt with the issue
    with open(new_file, 'w+') as modified:
        modified.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>" + data)
    return

However, this creates a problem. It will write but it adds "\ufeff" which I understand to be a BOM and the XML file can't be read properly. I took over this project for a coworker who left my company and they wrote this code. They addressed the issue by removing the BOM but it doesn't seem to work for me. I also suspect there's probably a more systematic way of doing it.

What am I doing wrong? Is there a way to remove these characters when I write the file? Should I be approaching this differently?

quicks
  • 67
  • 6
  • 1
    Does this answer your question? [Convert UTF-8 with BOM to UTF-8 with no BOM in Python](https://stackoverflow.com/questions/8898294/convert-utf-8-with-bom-to-utf-8-with-no-bom-in-python) – Sören May 23 '22 at 19:56
  • 1
    **"...I'm using Microsoft SQL Server Management Studio to create an XML file...."** Please add to your question how you are doing it. – Yitzhak Khabinsky May 23 '22 at 20:02

1 Answers1

0

Codecs package should do the trick.

StreamReader = codecs.getreader('utf-8-sig')
with StreamReader(open(file, 'rb')) as original:
    ...

Or much shorter version:

with codecs.open(file, 'r', 'utf-8-sig') as original:
    ...
Jinksy
  • 421
  • 4
  • 11