0

I'm trying to read a sql file but it keeps giving me the error:

UnicodeError: UTF-16 stream does not start with BOM

I've created a fxn to read sql files specifically:

import pandas as pd
import pyodbc as db
import os
import codecs

def sql_reader_single(qry_file, server_name, database, encoding='utf16'):
    server = db.connect(str('DRIVER={SQL Server};SERVER='+server_name+';DATABASE='+database+';'))
    with codecs.open(qry_file, encoding=encoding) as qf:
        data = pd.read_sql(qf.read(), server)
    return data

then I called it to read data:

Data = sp.sql_reader_single(qry_file=QryFile, server_name='my_server', database='my_db')

what am i doing wrong?

I've looked into:

utf-16 file seeking in python. how?

and tried both utf-16-le or utf-16-be, but I would get an error with a bunch of japanese/chinese characters like this:

pandas.io.sql.DatabaseError: Execution failed on sql '䕓䕌呃ഠ 楤瑳湩瑣਍††⨠਍†剆䵏䔠坄䔮坄䘮捡剴捥楥楶杮潇摯⁳牦൧': ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '0x0a0d'. (102) (SQLExecDirectW)")

the sql file contains a very simple query, like this:

SELECT distinct *
  FROM FactReceiving
clinomaniac
  • 2,200
  • 2
  • 17
  • 22
alwaysaskingquestions
  • 1,595
  • 5
  • 22
  • 49

2 Answers2

1

Try to read the file as UTF-8.

clinomaniac
  • 2,200
  • 2
  • 17
  • 22
  • the reason why i used utf-16 was because previously my query wont be read if it was utf-8 and only works w utf-16; i am very confused why sometimes works w 16 sometimes works w 8; not like i write in two different languages..... but i still really appreciate the help!!! – alwaysaskingquestions Feb 26 '18 at 21:58
  • I am not sure how the files were created and how they might be different. Usually the first few bytes in a file tell them apart. They are called BOM. If you open a file in Notepad++, you are able to choose which encoding to use. – clinomaniac Feb 26 '18 at 22:00
0

I used errors='ignore' with the utf-8 encoding to prevent missing hex codes from preventing processing.

def get_text(file_name):
    with open(file_name, 'r', encoding='utf-8', errors='ignore') as f:
        text = f.read()
    return text
Golden Lion
  • 3,840
  • 2
  • 26
  • 35