1

I need to persist data in a database with powershell. Occasionally the data contains an emoji.

DB: On the DB side everything should be fine. The attributes are set to NVARCHAR which makes it possible to persist emojis. When I inserted the data manually the emoji got displayed after I query them(). I tested it with example data in SSMS and it worked perfectly.

Powershell: When preparing the SQL Statement in Powershell I noticed that the emojis are interpreted in UTF8 (ðŸ¤ðŸ’°). Basically gibberish.

Is a conversion from UTF8 to Unicode even necessary? How can I persist the emojis as and not as ðŸ¤ðŸ’°/1f600

Redman
  • 37
  • 9
  • 1
    The phrase "conversion from UTF8 to Unicode" is meaningless - Unicode is the abstract list of code points, and UTF8 one of the ways of representing those code points in binary form. What is presumably happening is either double-encoding, or misinterpretation of the binary format as a different encoding. – IMSoP Oct 14 '20 at 11:39
  • Thanks for the clarification. Nonetheless the emojis get represented in UTF-8 in powershell. I'd like to get the emoji representation persisted and not the UTF8/Unicode values – Redman Oct 14 '20 at 11:45
  • 1
    No, they don't "get represented in UTF-8", and there is no such thing as "the emoji represntation" - the database isn't storing little images, it's storing the characters in some binary encoding, probably either UTF-8 or UTF-16. What you are seeing is that binary representation being misinterpreted, and displayed as, as you say, gibberish. This may all sound pedantic, but getting a better understanding the concepts involved will help you identify and fix your problem. – IMSoP Oct 14 '20 at 11:50
  • Does this answer your question? [Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)](https://stackoverflow.com/questions/57131654/using-utf-8-encoding-chcp-65001-in-command-prompt-windows-powershell-window) – iRon Oct 14 '20 at 11:58
  • @IMSoP "Unicode" is often used incorrectly by Microsoft to refer to UTF-16, due to the history of how Unicode was implemented by Windows. This has caused no end of confusion. – Mark Ransom Oct 14 '20 at 15:13
  • @MarkRansom That's true. I'll bear that in mind when explaining in future. – IMSoP Oct 14 '20 at 15:46
  • The gibberish looks like utf8 being misinterpreted as ascii. – js2010 Oct 14 '20 at 19:04

2 Answers2

1

My colleague had the correct answer to this problem.

To persist emojis in a MS SQL Database you need to declare the column as nvarchar(max) (max is not necessarily), which I already did.

I tried to persist example data which I had hardcoded in my PS Script like this

@{ description = "Example Description }

Apparently VS Code adds some kind of encoding on top of the data(our guess).

What basically solved the issue was simply requesting the data from the API and persist it into the database with prefix string literal with N + the nvarchar(max) column datatype

Example: SET @displayNameFormatted = N'"+$displayName+"' And then include that variable in my insert statement.

Redman
  • 37
  • 9
0

Does this answer your question? "Use NVARCHAR(size) datatype and prefix string literal with N:" Add emoji / emoticon to SQL Server table

1 Emoji in powershell is 2 utf16 surrogate characters, since the code for it is too high for 16 bits. Surrogates and Supplementary Characters

''.length
2

Microsoft has made "unicode" a confusing term, since that's what they call utf16 le encoding.

Powershell 5.1 doesn't recognize utf8 no bom encoded scripts automatically.

We don't know what your commandline actually is, but see also: Unicode support for Invoke-Sqlcmd in PowerShell

js2010
  • 23,033
  • 6
  • 64
  • 66