Python encode UTF-8 tweets to UCS-2 for SQL Server Load

Question

I'm trying to store tweets in SQL Server, they are UTF-8. Apparently SQL Server 2012 won't store UTF-8 in a nvarchar. Instead SQL Server recommends using UCS-2 according to this.

My whole script is in Python 3.3 trying to transform a series of .json files I get to a single tabular file and then bulk load it.

with open(fileName, "a+",encoding='utf-16') as the_file:
    writer = csv.writer(the_file, delimiter='\t', lineterminator='\n')
    for file in os.listdir(input):
        jsonData = open(input+file)
        data = json.load(jsonData)
        for tweetObject in data:
            #parseData here...
            writer.writerow(tweetData)
        jsonData.close()

Now clearly, UCS-2 isn't a default encoding, so where do I get it? Will I need to encode each line of UTF-8 in my file, or will it do that automatically if I set the file to use USC-2? I see a ton of stuff on google on how to solve reading this encoding but none on how to encode with it.

I'm open to other suggestions if you have them. Thanks!

EDIT: Updated code to have working code!

score 0 · Answer 1 · edited May 23 '17 at 10:26

0

This worked!

How to write UTF-8 characters using bulk insert in SQL Server?

Basically I need convert my input file to utf-16 and use a nvarchar column.

edited May 23 '17 at 10:26

Community

1
1

answered Sep 18 '14 at 18:40

dreyco676

911
1
8
14

Python encode UTF-8 tweets to UCS-2 for SQL Server Load

1 Answers1