0

I'm trying to store tweets in SQL Server, they are UTF-8. Apparently SQL Server 2012 won't store UTF-8 in a nvarchar. Instead SQL Server recommends using UCS-2 according to this.

My whole script is in Python 3.3 trying to transform a series of .json files I get to a single tabular file and then bulk load it.

with open(fileName, "a+",encoding='utf-16') as the_file:
    writer = csv.writer(the_file, delimiter='\t', lineterminator='\n')
    for file in os.listdir(input):
        jsonData = open(input+file)
        data = json.load(jsonData)
        for tweetObject in data:
            #parseData here...
            writer.writerow(tweetData)
        jsonData.close()

Now clearly, UCS-2 isn't a default encoding, so where do I get it? Will I need to encode each line of UTF-8 in my file, or will it do that automatically if I set the file to use USC-2? I see a ton of stuff on google on how to solve reading this encoding but none on how to encode with it.

I'm open to other suggestions if you have them. Thanks!

EDIT: Updated code to have working code!

Community
  • 1
  • 1
dreyco676
  • 911
  • 1
  • 8
  • 14

1 Answers1

0

This worked!

How to write UTF-8 characters using bulk insert in SQL Server?

Basically I need convert my input file to utf-16 and use a nvarchar column.

Community
  • 1
  • 1
dreyco676
  • 911
  • 1
  • 8
  • 14